CLI options
canpy [OPTIONS]Static analysis on Python source code using Jedi, CodeQL, and Tree-sitter. canpy builds one analysis in memory and emits it to one of three targets selected by --emit: the default analysis.json (symbol table + call graph), a Neo4j property graph, or the version-stamped Neo4j schema contract.
Options
Section titled “Options”| Option | Alias | Type | Default | Description |
|---|---|---|---|---|
--input | -i | PATH | None | Path to the project root directory. Required for --emit json and --emit neo4j; not required for --emit schema. |
--output | -o | PATH | None | Output directory for artifacts. Behavior depends on --emit (see Output files). |
--format | -f | json | msgpack | json | Output serialization for --emit json. |
--emit | json | neo4j | schema | json | Output target. json → analysis.json; neo4j → graph.cypher snapshot or a live Bolt push; schema → the Neo4j schema.json contract. | |
--app-name | TEXT | input dir name | Logical application name for the graph’s :PyApplication anchor. Defaults to the resolved basename of --input. | |
--neo4j-uri | TEXT | None | Push the graph to a live Neo4j over Bolt (incremental). Omit to write graph.cypher. Reads NEO4J_URI. Requires the neo4j extra. | |
--neo4j-user | TEXT | neo4j | Neo4j username. Reads NEO4J_USERNAME. | |
--neo4j-password | TEXT | neo4j | Neo4j password. Reads NEO4J_PASSWORD. Prefer the env var — the flag is visible in shell history and the process list. | |
--neo4j-database | TEXT | None | Neo4j database name. None uses the server default. Reads NEO4J_DATABASE. | |
--codeql / --no-codeql | flag | --no-codeql | Enable CodeQL-based call-graph resolution in addition to Jedi. | |
--ray / --no-ray | flag | --no-ray | Use Ray to build the symbol table in parallel. | |
--eager / --lazy | flag | --lazy | Rebuild the analysis (and venv) from scratch vs. reuse cache. | |
--skip-tests / --include-tests | flag | --skip-tests | Exclude or include test files in the analysis. | |
--no-venv / --venv | flag | --venv | Resolve imports against the ambient interpreter instead of building a per-project venv. Useful in CI, containers, and sandboxed runs. | |
--file-name | PATH | None | Analyze only this file (relative to --input; must be .py). | |
--cache-dir | -c | PATH | None | Where to store the cache. Defaults to .codeanalyzer in the input dir. |
--clear-cache / --keep-cache | flag | --keep-cache | Delete the cache on exit vs. retain it. | |
-v | count | 0 | Increase verbosity: -v (info), -vv (debug), -vvv (trace). | |
--help | Show the help message and exit. |
Notes on defaults
Section titled “Notes on defaults”- Lazy by default. Analysis reuses cached results for unchanged files. Use
--eagerto force a full rebuild. - Cache is kept by default. The cache survives between runs. Use
--clear-cacheto discard it on exit. - Tests excluded by default. Files under
test/testsdirectories, or namedtest_*.py/*_test.py, are skipped unless you pass--include-tests. - CodeQL off by default. Jedi resolves the call graph alone unless
--codeqlis set; CodeQL augments it. - A venv is built by default.
canpyprovisions a per-project analysis venv (built withuv, falling back topip) and wires it to Jedi for import resolution. Pass--no-venvto skip it and resolve against the ambient interpreter.
Output files
Section titled “Output files”What canpy produces depends on --emit:
--emit | With --output ./out | Without --output |
|---|---|---|
json (default) | ./out/analysis.json (or analysis.msgpack with --format msgpack) | compact JSON on stdout |
neo4j (no --neo4j-uri) | ./out/graph.cypher | graph.cypher in the current directory |
neo4j (with --neo4j-uri) | live Bolt push — --output is unused | live Bolt push |
schema | ./out/schema.json | schema.json on stdout |
For --emit json, --output names a directory, not a file — canpy writes analysis.json (or analysis.msgpack) inside it, creating the directory if needed. The json and msgpack formats encode the same PyApplication schema; see Output schema. The msgpack form is gzip-compressed MessagePack, and the CLI logs the compression ratio vs. JSON.
Emit targets
Section titled “Emit targets”--emit chooses where the single in-memory analysis goes. The analysis is built once; the target only decides how it is projected.
analysis.json — the default
Section titled “analysis.json — the default”The canonical symbol table and call graph as one JSON document. This is the in-process artifact every other tool has always consumed.
# Symbol table + call graph to stdoutcanpy --input ./my-python-project
# Write JSON to a directorycanpy --input ./my-python-project --output ./out
# msgpack + CodeQL, eager rebuild, cache discardedcanpy --input ./my-python-project --output ./out --format msgpack --codeql --eager --clear-cacheNeo4j property graph — --emit neo4j
Section titled “Neo4j property graph — --emit neo4j”Projects the same in-memory PyApplication into a labeled property graph. Every node label is Py-prefixed and every relationship type is PY_-prefixed (e.g. :PyClass, PY_CALLS), so Java, TypeScript, and Python analyzers can share one database without label or relationship-type collisions. Declarations are keyed by their signature under a shared :PySymbol label, and the application is anchored at a single :PyApplication node named by --app-name. The full topology is documented in the graph schema reference.
There are two sub-modes, decided solely by whether --neo4j-uri is set.
Without --neo4j-uri — a self-contained graph.cypher snapshot. Constraints and indexes, a scoped DETACH DELETE wipe of this app’s prior subtree, then batched UNWIND … MERGE for nodes and edges. It needs no extra dependencies and expresses the full truth of the analysis — it is not incremental. Load it with cypher-shell:
canpy --input ./my-python-project --emit neo4j --app-name my-service --output ./outcypher-shell < ./out/graph.cypherWith --neo4j-uri — an incremental live Bolt push. Ensures the DDL, diffs each module’s content_hash against the database, and rewrites only modules whose content changed. Shared :PyExternal / :PyPackage / :PyDecorator nodes are MERGE-only and nodes are never blindly deleted, so cross-module references survive. On a full run (no --file-name), modules whose source file vanished are pruned — scoped to this app’s :PyApplication anchor, so pushing app B never deletes app A’s modules from a shared database. Every graph carries a schema_version on its :PyApplication node (currently 1.1.0).
Prefer the environment for the password so it never lands in shell history or the process list:
export NEO4J_URI=bolt://localhost:7687export NEO4J_PASSWORD=secretcanpy --input ./my-python-project --emit neo4j --app-name my-serviceThe same run spelled out with flags:
canpy --input ./my-python-project --emit neo4j --app-name my-service \ --neo4j-uri bolt://localhost:7687 --neo4j-user neo4j# NEO4J_PASSWORD is read from the environmentThe live Bolt path needs the optional neo4j driver extra. If it is missing, canpy raises a clear error telling you to install it:
pip install 'codeanalyzer-python[neo4j]'The snapshot (graph.cypher) and --emit schema modes need nothing extra.
Schema contract — --emit schema
Section titled “Schema contract — --emit schema”Emits the machine-readable, version-stamped Neo4j schema (schema.json: node labels, relationship types, and their properties). It is a static catalog, so no project is required — --input is optional here.
# Print the schema contract to stdout (no project needed)canpy --emit schema
# Write it to a directorycanpy --emit schema --output ./out # → ./out/schema.jsonThe schema carries SCHEMA_VERSION (currently 1.1.0), the same value stamped onto every graph’s :PyApplication node. Pin to it so consumers can detect contract changes.
Reading the graph back with CLDK
Section titled “Reading the graph back with CLDK”Once a graph is in Neo4j, the CLDK Python SDK reads it directly — no JDK, no native binary, and no project source on the consumer, only read-only Neo4j credentials. Point CLDK.python at a Neo4jConnectionConfig whose application_name matches the --app-name the graph was loaded with, and it reconstructs the same typed PyApplication model and the same networkx call graph the in-process analyzer would build.
from cldk import CLDKfrom cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.python( backend=Neo4jConnectionConfig( uri="bolt://localhost:7687", username="neo4j", password="neo4j", application_name="my-service", # matches canpy --app-name ),)classes = analysis.get_classes() # Dict[str, PyClass]cg = analysis.get_call_graph() # networkx.DiGraph keyed by callable signaturesThe SDK’s neo4j driver is an optional extra: pip install cldk[neo4j]. See the Neo4j guide for the full read API.
Examples
Section titled “Examples”# Symbol table + call graph to stdoutcanpy --input ./proj
# Write JSON to a directorycanpy --input ./proj --output ./out
# msgpack + CodeQL, eager rebuild, cache discardedcanpy --input ./proj --output ./out --format msgpack --codeql --eager --clear-cache
# Property-graph snapshot to ./out/graph.cyphercanpy --input ./proj --emit neo4j --app-name proj --output ./out
# Incremental Bolt push (password from the environment)NEO4J_PASSWORD=secret canpy --input ./proj --emit neo4j --app-name proj \ --neo4j-uri bolt://localhost:7687
# Targeted single-file push (skips orphan pruning)NEO4J_PASSWORD=secret canpy --input ./proj --emit neo4j --app-name proj \ --neo4j-uri bolt://localhost:7687 --file-name src/app/routes.py
# Publish the version-stamped schema contractcanpy --emit schema --output ./out
# Ambient interpreter, no per-project venv (CI / containers)canpy --input ./proj --no-venv --output ./out
# Custom cache location, debug loggingcanpy --input ./proj --cache-dir /tmp/ca -vv