Skip to content

codeanalyzer-python

Point it at a Python project and get back a typed symbol table and call graph — as an analysis.json or a Neo4j property graph. Program analysis your agents can call.

Point canpy at a project and it builds one analysis in memory — a typed model of every module, class, method, and call edge, plus the framework entrypoints that reach them — then emits it the way you need it. It’s the Python backend behind CLDK, usable standalone as a CLI or a library.

One analysis, three output targets via --emit:

  • analysis.json (default) — the self-contained PyApplication artifact, loaded whole into memory by the consumer.
  • Neo4j property graph (--emit neo4j) — project the same model into a labeled property graph: a graph.cypher snapshot, or an incremental live push to Neo4j over Bolt. Every node label is Py-prefixed and every relationship type PY_-prefixed (:PyClass, PY_CALLS), so Java, TypeScript, and Python analyzers can share one database without label collisions. The graph is a queryable, persistent system of record that holds many applications at once — cross-service questions become a Cypher traversal instead of parsing giant JSON blobs.
  • Schema contract (--emit schema) — the machine-readable, version-stamped Neo4j schema (schema_version 1.1.0), no project required.

Build the analysis once and project it into a graph. Without --neo4j-uri, canpy writes a self-contained graph.cypher (constraints + indexes, a scoped wipe of this app’s prior subgraph, then batched MERGEs) that you load with cypher-shell:

Terminal window
canpy --input ./my-service --emit neo4j --app-name my-service
cypher-shell < graph.cypher

With --neo4j-uri, it pushes to a live Neo4j over Bolt incrementally — only modules whose content hash changed are rewritten, and on a full run modules whose source file vanished are pruned. The push is scoped to the :PyApplication anchor named by --app-name, so writing one application never clobbers another’s modules in a shared database:

Terminal window
export NEO4J_URI=bolt://localhost:7687
export NEO4J_PASSWORD=# prefer the env var so it stays out of shell history
canpy --input ./my-service --emit neo4j --app-name my-service

The live push needs the neo4j driver extra (pip install 'codeanalyzer-python[neo4j]'); the snapshot and schema modes need nothing extra.

A separate job populates the graph out of band; consumers just read it. The CLDK Python SDK has a read-only Neo4j backend — point it at the Bolt URI and it reconstructs the same typed PyClass/PyCallable objects and the same networkx call graph as the in-process analyzer, with no JDK, no native binary, and no project source on the consumer. It only needs the graph and read-only credentials.

from cldk import CLDK
from cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.python(
backend=Neo4jConnectionConfig(
uri="bolt://localhost:7687",
username="neo4j",
password="neo4j",
application_name="my-service", # matches canpy --app-name
),
)
classes = analysis.get_classes() # Dict[str, PyClass]
cg = analysis.get_call_graph() # networkx.DiGraph keyed by callable signatures

application_name matches the --app-name the graph was loaded with, scoping every query to that one application. The neo4j driver is an optional extra here too: pip install cldk[neo4j].