A Python static-analysis toolkit — the CLDK backend that emits a canonical symbol table and call graph, as analysis.json or a Neo4j property graph.
canpy is a static analyzer for Python built on Jedi, with optional
CodeQL-resolved call edges and
Tree-sitter parsing. It produces the canonical CodeLLM-DevKit
(CLDK) analysis.json — a symbol table plus a call graph — and can project that same analysis into a
Neo4j property graph. It is the Python backend behind
CLDK, mirroring its
TypeScript (cants) and
Java siblings.
Every run produces a symbol table and a call graph. Edges come from Jedi's lexical resolution by
default; --codeql resolves additional edges (RPC / third-party / dynamically-dispatched targets)
and merges them with the Jedi-derived edges, also backfilling callees Jedi could not resolve.
- Symbol table — modules, classes, functions, methods, variables, decorators, imports, and docstrings, with precise source spans.
- Call graph — Jedi's lexical resolver by default, with optional CodeQL-resolved edges
(
--codeql) for RPC / third-party / dynamically-dispatched targets, merged with the Jedi edges; CodeQL also backfills callees Jedi could not resolve. - Neo4j output — project the analysis into a labeled property graph: a self-contained
graph.cyphersnapshot, or an incremental push to a live database over Bolt. - Versioned schema — a machine-readable, version-stamped Neo4j schema contract (
--emit schema), checked in asschema.neo4j.jsonand shipped with every release. - Incremental cache — per-file results are cached under
.codeanalyzer;--lazy(default) reuses them,--eagerforces a clean rebuild.--raydistributes the work across cores. - Compact output — canonical
analysis.json, or binaryanalysis.msgpackfor smaller artifacts.
-
Python 3.10 or newer.
-
A C toolchain and the
venv/ development headers — the analyzer builds an isolated virtual environment per project (via Python'svenv) so Jedi can resolve types and imports:# Ubuntu / Debian sudo apt install python3-venv python3-dev build-essential # Fedora / RHEL / CentOS sudo dnf group install "Development Tools" && sudo dnf install python3-venv python3-devel # macOS xcode-select --install
pip install codeanalyzer-python
canpy --helpFor the optional live Neo4j push (--emit neo4j --neo4j-uri …), install the neo4j extra:
pip install 'codeanalyzer-python[neo4j]'Install the CLI as an isolated tool with the one-line installer (provisions via uv / pipx / pip):
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/codellm-devkit/codeanalyzer-python/releases/latest/download/canpy-installer.sh | shbrew install codellm-devkit/tap/codeanalyzer-pythonThe formula depends on uv and installs canpy as an isolated,
version-pinned uv tool (the package and its dependencies are resolved and cached on first run).
This project uses uv for dependency management.
git clone https://github.com/codellm-devkit/codeanalyzer-python
cd codeanalyzer-python
uv sync --all-groups
uv run canpy --helpcanpy --input /path/to/python/projectWith no --output, the analysis is printed to stdout as compact JSON; with --output <dir> it is
written to analysis.json (or graph.cypher for --emit neo4j, or analysis.msgpack with
--format msgpack) in that directory.
$ canpy --help
Usage: canpy [OPTIONS] COMMAND [ARGS]...
Static Analysis on Python source code using Jedi, CodeQL and Tree sitter.
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --input -i PATH Path to the │
│ project root │
│ directory (not │
│ required for │
│ --emit schema). │
│ --output -o PATH Output directory │
│ for artifacts. │
│ --format -f [json|msgpack] Output format for │
│ --emit json: json │
│ or msgpack. │
│ [default: json] │
│ --emit [json|neo4j|sche Output target: │
│ ma] json │
│ (analysis.json, │
│ default) | neo4j │
│ (graph.cypher or │
│ live Bolt push) | │
│ schema (the Neo4j │
│ schema.json │
│ contract). │
│ [default: json] │
│ --app-name TEXT Logical │
│ application name │
│ for the graph │
│ :PyApplication │
│ anchor (default: │
│ input dir name). │
│ --neo4j-uri TEXT Push the graph to │
│ a live Neo4j over │
│ Bolt │
│ (incremental); │
│ omit to write │
│ graph.cypher. │
│ [env var: │
│ NEO4J_URI] │
│ --neo4j-user TEXT Neo4j username. │
│ [env var: │
│ NEO4J_USERNAME] │
│ [default: neo4j] │
│ --neo4j-password TEXT Neo4j password. │
│ Prefer the env │
│ var over the flag │
│ (the flag is │
│ visible in shell │
│ history / process │
│ list). │
│ [env var: │
│ NEO4J_PASSWORD] │
│ [default: neo4j] │
│ --neo4j-database TEXT Neo4j database │
│ name (default: │
│ server default). │
│ [env var: │
│ NEO4J_DATABASE] │
│ --codeql --no-codeql Enable │
│ CodeQL-based │
│ analysis. │
│ [default: │
│ no-codeql] │
│ --ray --no-ray Enable Ray for │
│ distributed │
│ analysis. │
│ [default: no-ray] │
│ --eager --lazy Enable eager or │
│ lazy analysis. │
│ Defaults to lazy. │
│ [default: lazy] │
│ --skip-tests --include-tests Skip test files │
│ in analysis. │
│ [default: │
│ skip-tests] │
│ --no-venv --venv Skip virtualenv │
│ creation and │
│ dependency │
│ installation; │
│ resolve imports │
│ against the │
│ ambient Python │
│ environment │
│ instead. │
│ [default: venv] │
│ --file-name PATH Analyze only the │
│ specified file │
│ (relative to │
│ input directory). │
│ --cache-dir -c PATH Directory to │
│ store analysis │
│ cache. Defaults │
│ to │
│ '.codeanalyzer' │
│ in the input │
│ directory. │
│ --clear-cache --keep-cache Clear cache after │
│ analysis. By │
│ default, cache is │
│ retained. │
│ [default: │
│ keep-cache] │
│ -v INTEGER Increase │
│ verbosity: -v, │
│ -vv, -vvv │
│ [default: 0] │
│ --help Show this message │
│ and exit. │
╰──────────────────────────────────────────────────────────────────────────────╯
-
Basic analysis to stdout, or to a file:
canpy --input ./my-python-project # compact JSON on stdout canpy --input ./my-python-project --output ./out # → ./out/analysis.json
-
Binary output (msgpack):
canpy --input ./my-python-project --output ./out --format msgpack # → ./out/analysis.msgpack -
Resolve extra call edges with CodeQL:
canpy --input ./my-python-project --codeql
By default, edges come from Jedi's lexical analysis. Adding
--codeqlresolves additional edges (including RPC / third-party / dynamically-dispatched targets) and merges them with the Jedi-derived edges; CodeQL also backfills resolved callees Jedi could not resolve. CodeQL integration is experimental; the CLI is downloaded into<cache_dir>/codeql/on first use. -
Emit a Neo4j snapshot, or push to a live database:
canpy --input ./my-python-project --emit neo4j --output ./out # → ./out/graph.cypher canpy --input ./my-python-project --emit neo4j \ --neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-password secret -
Emit the Neo4j schema contract:
canpy --emit schema # print schema.json to stdout (no project needed) canpy --emit schema --output ./out # → ./out/schema.json
-
Force a clean rebuild with a custom cache directory:
canpy --input ./my-python-project --eager --cache-dir /path/to/custom-cache
canpy builds one analysis in memory and can emit it three ways (--emit):
A PyApplication document — the canonical CLDK contract:
By default this is printed to stdout in JSON; with --output it is written to analysis.json (or
analysis.msgpack with --format msgpack, a more compact binary format).
--emit neo4j projects the same analysis into a labeled property graph. Every node label is
Py-prefixed and every relationship type is PY_-prefixed (e.g. :PyClass, PY_CALLS) so multiple
language analyzers can share one database without label or relationship-type collisions. Declarations
are keyed by their signature under a shared :PySymbol label; calls, imports, inheritance,
decorators, and call sites are relationships:
- Without
--neo4j-uri— writes a self-containedgraph.cypher(constraints + indexes, a scoped wipe, then batchedMERGEs). Load it withcypher-shell < graph.cypher. Needs no extra dependencies. - With
--neo4j-uri— pushes to a live Neo4j over Bolt incrementally: only modules whose content hash changed are rewritten, and on a full run modules whose source file vanished are pruned. Requires theneo4jextra. Every graph carries aschema_versionon its:PyApplicationnode.
Call-graph endpoints that aren't present in the symbol table (third-party / framework / RPC targets)
are materialized as :PyExternal ghost nodes, mirroring the analyzer's own ghost-node behaviour.
The connection options also read from the standard Neo4j environment variables — NEO4J_URI,
NEO4J_USERNAME, NEO4J_PASSWORD, NEO4J_DATABASE — when the corresponding flag is omitted (an
explicit flag wins). Prefer the env var for the password so it doesn't land in shell history or the
process list:
export NEO4J_URI=bolt://localhost:7687
export NEO4J_PASSWORD=secret
canpy -i ./my-project --emit neo4j # credentials picked up from the environment--emit schema writes the machine-readable, version-stamped Neo4j schema (schema.json: node labels,
relationships, properties, constraints, and indexes). It needs no project and is checked into the repo
as schema.neo4j.json and bundled in every release as a GitHub Release asset, so a consumer can
validate producer/consumer compatibility without invoking the tool. The shape of the contract matches
the codeanalyzer-typescript backend.
A UML of the analysis.json schema (the PyApplication containment tree) is checked in as
schema-uml.drawio, and the property-graph schema as
neo4j-schema.drawio.
This project uses uv.
uv sync --all-groups
uv run canpy --input /path/to/project # run from source
uv run canpy --emit schema > schema.neo4j.json # regenerate the checked-in schema contract
uv run python scripts/update_readme.py # regenerate the canpy --help block above
uv run pytest # run the test suiteThe Neo4j schema-conformance test always runs. The Neo4j bolt integration test spins up a real Neo4j via Testcontainers and is opt-in — it needs a container runtime (Docker or Podman) and is enabled with an environment variable:
RUN_CONTAINER_TESTS=1 uv run pytest test/test_neo4j_bolt.py -sApache 2.0 — see LICENSE.
{ "symbol_table": { /* file path → module (classes, functions, variables, imports, …) */ }, "call_graph": [ /* CALL_DEP edges: { source, target, weight, provenance } keyed by callable signature */ ] }