Quickstart
canpy points at a Python project and produces one typed artifact — its symbol table, call graph, and framework entrypoints. Three steps below: install, run it against a project, and read the result. Then emit the same analysis into a Neo4j property graph.
-
Install the CLI.
Terminal window pip install codeanalyzer-pythonThat installs the
canpycommand. Jedi and Tree-sitter ship with the package; CodeQL is downloaded on demand only if you opt in with--codeql. -
Run it against a project.
Point
--inputat any Python project root and--outputat a directory for the result.Terminal window canpy --input ./my-python-project --output ./outOn the first run
canpycreates a virtual environment under.codeanalyzer/, installs the project’s dependencies into it, walks every.pyfile, and writes./out/analysis.json. This is the default--emit jsontarget. -
Read the result.
analysis.jsonis a singlePyApplicationobject with three top-level keys.Terminal window jq 'keys' ./out/analysis.json# [ "call_graph", "entrypoints", "symbol_table" ]jq '.symbol_table | length' ./out/analysis.json # modules analyzedjq '.call_graph | length' ./out/analysis.json # call edgesThat’s it — a directory of source files is now a typed, queryable model of the program.
Load it into a graph (networkx)
Section titled “Load it into a graph (networkx)”The call graph is a flat list of source -> target edges keyed by callable signature, so it drops straight into networkx:
import jsonimport networkx as nx
app = json.load(open("./out/analysis.json"))
g = nx.DiGraph()for edge in app["call_graph"]: g.add_edge(edge["source"], edge["target"])
print(g.number_of_nodes(), "nodes,", g.number_of_edges(), "edges")# Is a sink reachable from an entrypoint? A graph query, not a guess.# print(nx.has_path(g, entry_sig, sink_sig))This works well for one application held in memory. When you want the analysis to persist, compose across many applications, or be read by other tools without re-running it, emit it into Neo4j instead.
Load it into Neo4j
Section titled “Load it into Neo4j”canpy builds one analysis in memory and can project it into a labeled property graph with --emit neo4j. Every node label is Py-prefixed and every relationship type PY_-prefixed (:PyClass, PY_CALLS), so Java, TypeScript, and Python analyzers can share one database without label or relationship-type collisions. Each application is anchored at its own :PyApplication node, named by --app-name, so a single Neo4j database holds many applications and you query across them with Cypher instead of loading giant JSON blobs.
There are two ways to get the graph into Neo4j, selected solely by whether you pass --neo4j-uri.
Without --neo4j-uri, canpy writes a self-contained graph.cypher to --output (constraints + indexes, a scoped wipe of this app’s prior subgraph, then batched MERGEs). It needs no extra dependencies and expresses the full truth of the analysis:
canpy --input ./my-python-project --emit neo4j --app-name my-service --output ./outLoad it into a running Neo4j with cypher-shell:
cypher-shell < ./out/graph.cypherThe snapshot does a scoped DETACH DELETE of the :PyApplication {name: "my-service"} subtree before reloading, so re-running it replaces this application cleanly without touching other applications in the database.
With --neo4j-uri, canpy pushes to a live Neo4j over Bolt incrementally — it diffs each module’s content hash against the database and only rewrites modules that changed, and on a full run it prunes modules whose source file vanished. The prune is scoped to the :PyApplication anchor named by --app-name, so writing one application never deletes another’s modules from a shared database.
export NEO4J_URI=bolt://localhost:7687export NEO4J_USERNAME=neo4jexport NEO4J_PASSWORD=secret
canpy --input ./my-python-project --emit neo4j --app-name my-serviceThe live push reads NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD, and NEO4J_DATABASE from the environment (an explicit flag wins when set). Prefer the env var for the password so it doesn’t land in your shell history or the process list.
Once the graph is loaded, query it with Cypher — for example, the call edges out of a single application:
MATCH (:PyApplication {name: "my-service"})-[:PY_HAS_MODULE]->(:PyModule) -[:PY_DECLARES]->(c:PyCallable)-[:PY_CALLS]->(callee)RETURN c.signature, callee.signatureLIMIT 25;Read it back with the CLDK SDK
Section titled “Read it back with the CLDK SDK”The graph is populated out of band by canpy; consumers just read it. The CLDK Python SDK has a read-only Neo4j backend — point it at the Bolt URI with the same application_name you loaded under, and it reconstructs the same typed PyClass / PyCallable objects and the same networkx call graph as the in-process analyzer, with no JDK, no native binary, and no project source on the consumer. It only needs the graph and read-only credentials.
from cldk import CLDKfrom cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.python( backend=Neo4jConnectionConfig( uri="bolt://localhost:7687", username="neo4j", password="neo4j", application_name="my-service", # matches canpy --app-name ),)
classes = analysis.get_classes() # Dict[str, PyClass]cg = analysis.get_call_graph() # networkx.DiGraph keyed by callable signatures
print(len(classes), "classes,", cg.number_of_edges(), "call edges")The Neo4j backend in the SDK is the same optional extra: pip install cldk[neo4j]. See the Neo4j property graph guide for the full schema, incremental semantics, and the SDK read API.
Go deeper with CodeQL
Section titled “Go deeper with CodeQL”The default run uses Jedi for resolution — fast, no external tooling. Add --codeql to resolve the edges lexical analysis misses (dynamic dispatch, RPC, third-party targets). The CodeQL CLI is downloaded into the project cache on first use and reused thereafter. This augmentation applies to both the json and neo4j emit targets — the same enriched call graph is what gets projected into the property graph.
canpy --input ./my-python-project --output ./out --codeql