CLI usage
The canpy command runs static analysis on a Python project and builds one PyApplication artifact in memory, then emits it to your chosen target. This guide walks through the common invocations; for the full flag table see the CLI reference.
Basic analysis
Section titled “Basic analysis”The only required flag is --input (-i), the project root:
canpy --input ./my-python-projectWith no --output, the analysis is printed to stdout as compact JSON. Add --output (-o) to write it to a file instead:
canpy --input ./my-python-project --output ./out# -> ./out/analysis.jsonEmit targets
Section titled “Emit targets”canpy builds a single analysis in memory and can emit it three ways via --emit:
--emit | Output | Needs --input? | Extra deps |
|---|---|---|---|
json (default) | analysis.json (or analysis.msgpack) | yes | — |
neo4j | a graph.cypher snapshot, or a live Bolt push with --neo4j-uri | yes | only the Bolt push: [neo4j] |
schema | the version-stamped Neo4j schema.json contract | no | — |
json is the default and is what the rest of these examples build on. neo4j projects the same in-memory PyApplication into a labeled property graph (covered below). schema serializes the static, project-independent schema contract — no analysis runs, so --input is optional:
# Print the schema contract to stdout...canpy --emit schema
# ...or write it to a directory as schema.jsoncanpy --emit schema --output ./out# -> ./out/schema.jsongraph LR SRC["Python project"] --> A["canpy<br/>(one analysis in memory)"] A -->|"--emit json"| J["analysis.json / .msgpack"] A -->|"--emit neo4j (no uri)"| C["graph.cypher snapshot"] A -->|"--emit neo4j --neo4j-uri"| B["live Neo4j (Bolt, incremental)"] A -->|"--emit schema"| S["schema.json contract"]
The Neo4j property graph
Section titled “The Neo4j property graph”--emit neo4j projects the analysis into a labeled property graph instead of a single JSON blob. Every node label is Py-prefixed and every relationship type is PY_-prefixed (:PyClass, :PyCallable, PY_CALLS, PY_DECLARES), so the Java, TypeScript, and Python analyzers can share one database without label or relationship-type collisions. Declarations are keyed by their signature under a shared :PySymbol label. For the full topology see the graph schema reference.
The graph is anchored at a single :PyApplication node, and there are two ways to populate it — a self-contained snapshot or a live incremental push — chosen solely by whether --neo4j-uri is set.
The application anchor: --app-name
Section titled “The application anchor: --app-name”--app-name sets the name of the single :PyApplication root node for this graph. It is the merge key (uniqueness-constrained), and everything else hangs off it via PY_HAS_MODULE. When omitted it defaults to the basename of the resolved --input directory:
canpy --input ./my-service --emit neo4j --app-name my-service# the :PyApplication anchor is named "my-service"The anchor name also scopes every graph mutation, so many applications can live in one database without clobbering each other:
- The
graph.cyphersnapshot wipes only(:PyApplication {name: <app>})and its module subtree before reloading. - The Bolt orphan prune on a full run is scoped to
(:PyApplication {name: $app})-[:PY_HAS_MODULE]->(:PyModule), so pushing app B never deletes app A’s modules from a shared cluster.
Each graph also carries a schema_version (currently 1.1.0) stamped on its :PyApplication node, and it is the value the CLDK Python SDK matches via application_name to read back exactly this app’s subgraph. Keep --app-name (CLI) and application_name (SDK) identical.
Snapshot vs. live push
Section titled “Snapshot vs. live push”Without --neo4j-uri, canpy writes a self-contained graph.cypher file: the constraints and indexes, a scoped DETACH DELETE of this app’s prior subgraph, then batched UNWIND ... MERGE statements for every node and edge. It needs no extra dependencies and expresses the full truth of the analysis (it is not incremental). With --output, the file lands in that directory; otherwise it is written to the current directory.
canpy --input ./my-service --emit neo4j --app-name my-service --output ./out# -> ./out/graph.cypherLoad it into Neo4j with cypher-shell:
cypher-shell -u neo4j -p "$NEO4J_PASSWORD" < ./out/graph.cypherThis path is ideal for committing a reproducible snapshot to CI artifacts, seeding a local database, or loading a graph offline with no driver installed.
With --neo4j-uri, canpy pushes to a live Neo4j over Bolt incrementally: it ensures the DDL, diffs each module’s content_hash against what is already in the database, and only rewrites the modules that changed. Shared :PyExternal / :PyPackage / :PyDecorator nodes are MERGE-only and never blindly deleted, so cross-module references survive. On a full run, modules whose source file vanished are pruned (scoped to this app’s anchor).
The live push needs the optional neo4j driver. Install the extra:
pip install 'codeanalyzer-python[neo4j]'Point --neo4j-uri at the server. Prefer the NEO4J_PASSWORD environment variable over --neo4j-password — the flag is visible in your shell history and the process list:
export NEO4J_URI=bolt://localhost:7687export NEO4J_USERNAME=neo4jexport NEO4J_PASSWORD=secret
canpy --input ./my-service --emit neo4j --app-name my-serviceThe connection flags each fall back to a standard environment variable when the flag is omitted (an explicit flag wins):
| Flag | Env var | Default |
|---|---|---|
--neo4j-uri | NEO4J_URI | — (omit to write graph.cypher) |
--neo4j-user | NEO4J_USERNAME | neo4j |
--neo4j-password | NEO4J_PASSWORD | neo4j |
--neo4j-database | NEO4J_DATABASE | server default |
So a fully explicit push looks like this (use env vars for the password in practice):
canpy \ --input ./my-service \ --emit neo4j \ --app-name my-service \ --neo4j-uri bolt://neo4j.internal:7687 \ --neo4j-user neo4j \ --neo4j-database analysisBecause the push is incremental and app-scoped, it fits a CI or scheduled job that re-analyzes on each commit: only the changed units are re-pushed, and many jobs can write app-scoped subgraphs into one shared cluster while read-only consumers fan out from it.
Targeted pushes skip pruning
Section titled “Targeted pushes skip pruning”On a Bolt push, adding --file-name makes the run targeted rather than a full run. A targeted run rewrites only that file’s module and skips orphan pruning — modules for deleted files are not removed. A full run (no --file-name) enables pruning of vanished modules.
# Targeted: re-push one changed file, leave everything else (no pruning)canpy --input ./my-service --emit neo4j --app-name my-service \ --neo4j-uri bolt://localhost:7687 --file-name src/app/routes.py
# Full run: re-analyze the whole project and prune modules whose files are gonecanpy --input ./my-service --emit neo4j --app-name my-service \ --neo4j-uri bolt://localhost:7687Reading the graph from the CLDK SDK
Section titled “Reading the graph from the CLDK SDK”Once the graph is populated, the CLDK Python SDK can read it back without re-analyzing — no JDK, no native binary, and no project source on the consumer. The graph is produced once, out of band by the canpy --emit neo4j job above; the SDK is a read-only client that only needs the Bolt URI and read-only credentials.
Install the SDK with its driver extra:
pip install 'cldk[neo4j]'Pass a Neo4jConnectionConfig as the backend. Its application_name must match the --app-name the graph was loaded with:
from cldk import CLDKfrom cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.python( backend=Neo4jConnectionConfig( uri="bolt://localhost:7687", username="neo4j", password="neo4j", # read-only credentials suffice application_name="my-service", # matches canpy --app-name ),)
classes = analysis.get_classes() # Dict[str, PyClass]cg = analysis.get_call_graph() # networkx.DiGraph keyed by callable signaturesfor sig, cls in classes.items(): print(sig, list(cls.methods))The Neo4j backend reconstructs the same typed model objects and the same networkx call graph as the in-process analyzer: get_symbol_table(), get_call_graph(), get_modules(), get_classes(), get_methods(), get_callers(), get_callees(), get_imports() all return the identical PyClass / PyCallable models. The backend is a context manager (with, and .close() to release the driver), and because the graph is external, project_path is optional.
Output formats (for --emit json)
Section titled “Output formats (for --emit json)”When emitting JSON, the default serialization is json. Pass --format msgpack (-f) for a gzip-compressed MessagePack artifact — smaller and faster to load for large projects:
canpy --input ./my-python-project --output ./out --format msgpack# -> ./out/analysis.msgpackThe CLI logs the compression ratio relative to JSON when it writes msgpack. The schema is identical across formats; only the serialization differs.
Enabling CodeQL
Section titled “Enabling CodeQL”By default the call graph comes from Jedi’s lexical analysis. Add --codeql to resolve additional edges — including RPC, third-party, and dynamically-dispatched targets — and merge them with the Jedi edges. CodeQL also backfills resolved callees on Jedi call sites it couldn’t resolve.
canpy --input ./my-python-project --codeqlCaching: eager vs lazy
Section titled “Caching: eager vs lazy”Analysis is lazy by default: canpy caches results under .codeanalyzer/ and reuses the entries for files that haven’t changed (detected by mtime, size, and content hash). Pass --eager to rebuild everything from scratch:
# Lazy (default) — reuse unchanged files from cachecanpy --input ./my-python-project
# Eager — rebuild the analysis and the virtual environmentcanpy --input ./my-python-project --eagerControl where the cache lives with --cache-dir (-c). If unset, it defaults to .codeanalyzer in the input project directory:
canpy --input ./my-python-project --cache-dir /tmp/ca-cache# -> /tmp/ca-cache/.codeanalyzerBy default the cache is kept after a run. Pass --clear-cache to delete it on exit (useful in CI):
canpy --input ./my-python-project --clear-cacheSingle-file mode
Section titled “Single-file mode”To analyze one file rather than the whole project, pass --file-name relative to --input:
canpy --input ./my-python-project --file-name src/app/routes.pyThe path must exist under --input and end in .py. As noted above, on a Bolt push --file-name also makes the run targeted and skips orphan pruning.
Resolving imports without a venv
Section titled “Resolving imports without a venv”By default canpy builds a per-project analysis virtualenv (now provisioned with uv — parallel downloads and a shared cache, falling back to pip) and wires it to Jedi for import resolution. Pass --no-venv to skip venv creation and dependency installation entirely, resolving imports against the ambient Python interpreter instead:
canpy --input ./my-python-project --no-venvThis is useful in CI, containers, and sandboxed runs where the dependencies are already installed in the environment and building a fresh venv is wasted work. The default (--venv) builds the per-project environment.
Including test files
Section titled “Including test files”Test files are skipped by default — any file under a test/tests directory, or named test_*.py / *_test.py. Include them with --include-tests:
canpy --input ./my-python-project --include-testsParallelism with Ray
Section titled “Parallelism with Ray”For large projects, --ray distributes symbol-table construction across workers:
canpy --input ./large-project --rayVerbosity
Section titled “Verbosity”The tool is quiet by default. Stack -v for progressively more logging:
canpy --input ./my-python-project -v # infocanpy --input ./my-python-project -vv # debugcanpy --input ./my-python-project -vvv # tracePutting it together
Section titled “Putting it together”A typical CI invocation — eager rebuild, CodeQL on, msgpack out, cache discarded:
canpy \ --input ./my-python-project \ --output ./artifacts \ --format msgpack \ --codeql \ --eager \ --clear-cache \ -vAnd a scheduled graph-population job — an incremental Bolt push into a shared cluster, with the password supplied via the environment:
export NEO4J_URI=bolt://neo4j.internal:7687export NEO4J_PASSWORD=secret
canpy \ --input ./my-service \ --emit neo4j \ --app-name my-service \ --no-venv \ -v