Skip to content

Quickstart

canpy points at a Python project and produces one typed artifact — its symbol table, call graph, and framework entrypoints. Three steps below: install, run it against a project, and read the result. Then emit the same analysis into a Neo4j property graph.

  1. Install the CLI.

    Terminal window
    pip install codeanalyzer-python

    That installs the canpy command. Jedi and Tree-sitter ship with the package; CodeQL is downloaded on demand only if you opt in with --codeql.

  2. Run it against a project.

    Point --input at any Python project root and --output at a directory for the result.

    Terminal window
    canpy --input ./my-python-project --output ./out

    On the first run canpy creates a virtual environment under .codeanalyzer/, installs the project’s dependencies into it, walks every .py file, and writes ./out/analysis.json. This is the default --emit json target.

  3. Read the result.

    analysis.json is a single PyApplication object with three top-level keys.

    Terminal window
    jq 'keys' ./out/analysis.json
    # [ "call_graph", "entrypoints", "symbol_table" ]
    jq '.symbol_table | length' ./out/analysis.json # modules analyzed
    jq '.call_graph | length' ./out/analysis.json # call edges

    That’s it — a directory of source files is now a typed, queryable model of the program.

The call graph is a flat list of source -> target edges keyed by callable signature, so it drops straight into networkx:

reachable.py
import json
import networkx as nx
app = json.load(open("./out/analysis.json"))
g = nx.DiGraph()
for edge in app["call_graph"]:
g.add_edge(edge["source"], edge["target"])
print(g.number_of_nodes(), "nodes,", g.number_of_edges(), "edges")
# Is a sink reachable from an entrypoint? A graph query, not a guess.
# print(nx.has_path(g, entry_sig, sink_sig))

This works well for one application held in memory. When you want the analysis to persist, compose across many applications, or be read by other tools without re-running it, emit it into Neo4j instead.

canpy builds one analysis in memory and can project it into a labeled property graph with --emit neo4j. Every node label is Py-prefixed and every relationship type PY_-prefixed (:PyClass, PY_CALLS), so Java, TypeScript, and Python analyzers can share one database without label or relationship-type collisions. Each application is anchored at its own :PyApplication node, named by --app-name, so a single Neo4j database holds many applications and you query across them with Cypher instead of loading giant JSON blobs.

There are two ways to get the graph into Neo4j, selected solely by whether you pass --neo4j-uri.

Without --neo4j-uri, canpy writes a self-contained graph.cypher to --output (constraints + indexes, a scoped wipe of this app’s prior subgraph, then batched MERGEs). It needs no extra dependencies and expresses the full truth of the analysis:

Terminal window
canpy --input ./my-python-project --emit neo4j --app-name my-service --output ./out

Load it into a running Neo4j with cypher-shell:

Terminal window
cypher-shell < ./out/graph.cypher

The snapshot does a scoped DETACH DELETE of the :PyApplication {name: "my-service"} subtree before reloading, so re-running it replaces this application cleanly without touching other applications in the database.

Once the graph is loaded, query it with Cypher — for example, the call edges out of a single application:

MATCH (:PyApplication {name: "my-service"})-[:PY_HAS_MODULE]->(:PyModule)
-[:PY_DECLARES]->(c:PyCallable)-[:PY_CALLS]->(callee)
RETURN c.signature, callee.signature
LIMIT 25;

The graph is populated out of band by canpy; consumers just read it. The CLDK Python SDK has a read-only Neo4j backend — point it at the Bolt URI with the same application_name you loaded under, and it reconstructs the same typed PyClass / PyCallable objects and the same networkx call graph as the in-process analyzer, with no JDK, no native binary, and no project source on the consumer. It only needs the graph and read-only credentials.

read_graph.py
from cldk import CLDK
from cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.python(
backend=Neo4jConnectionConfig(
uri="bolt://localhost:7687",
username="neo4j",
password="neo4j",
application_name="my-service", # matches canpy --app-name
),
)
classes = analysis.get_classes() # Dict[str, PyClass]
cg = analysis.get_call_graph() # networkx.DiGraph keyed by callable signatures
print(len(classes), "classes,", cg.number_of_edges(), "call edges")

The Neo4j backend in the SDK is the same optional extra: pip install cldk[neo4j]. See the Neo4j property graph guide for the full schema, incremental semantics, and the SDK read API.

The default run uses Jedi for resolution — fast, no external tooling. Add --codeql to resolve the edges lexical analysis misses (dynamic dispatch, RPC, third-party targets). The CodeQL CLI is downloaded into the project cache on first use and reused thereafter. This augmentation applies to both the json and neo4j emit targets — the same enriched call graph is what gets projected into the property graph.

Terminal window
canpy --input ./my-python-project --output ./out --codeql