Skip to content

CLI usage

The canpy command runs static analysis on a Python project and builds one PyApplication artifact in memory, then emits it to your chosen target. This guide walks through the common invocations; for the full flag table see the CLI reference.

The only required flag is --input (-i), the project root:

Terminal window
canpy --input ./my-python-project

With no --output, the analysis is printed to stdout as compact JSON. Add --output (-o) to write it to a file instead:

Terminal window
canpy --input ./my-python-project --output ./out
# -> ./out/analysis.json

canpy builds a single analysis in memory and can emit it three ways via --emit:

--emitOutputNeeds --input?Extra deps
json (default)analysis.json (or analysis.msgpack)yes
neo4ja graph.cypher snapshot, or a live Bolt push with --neo4j-uriyesonly the Bolt push: [neo4j]
schemathe version-stamped Neo4j schema.json contractno

json is the default and is what the rest of these examples build on. neo4j projects the same in-memory PyApplication into a labeled property graph (covered below). schema serializes the static, project-independent schema contract — no analysis runs, so --input is optional:

Terminal window
# Print the schema contract to stdout...
canpy --emit schema
# ...or write it to a directory as schema.json
canpy --emit schema --output ./out
# -> ./out/schema.json
graph LR
  SRC["Python project"] --> A["canpy<br/>(one analysis in memory)"]
  A -->|"--emit json"| J["analysis.json / .msgpack"]
  A -->|"--emit neo4j (no uri)"| C["graph.cypher snapshot"]
  A -->|"--emit neo4j --neo4j-uri"| B["live Neo4j (Bolt, incremental)"]
  A -->|"--emit schema"| S["schema.json contract"]

--emit neo4j projects the analysis into a labeled property graph instead of a single JSON blob. Every node label is Py-prefixed and every relationship type is PY_-prefixed (:PyClass, :PyCallable, PY_CALLS, PY_DECLARES), so the Java, TypeScript, and Python analyzers can share one database without label or relationship-type collisions. Declarations are keyed by their signature under a shared :PySymbol label. For the full topology see the graph schema reference.

The graph is anchored at a single :PyApplication node, and there are two ways to populate it — a self-contained snapshot or a live incremental push — chosen solely by whether --neo4j-uri is set.

--app-name sets the name of the single :PyApplication root node for this graph. It is the merge key (uniqueness-constrained), and everything else hangs off it via PY_HAS_MODULE. When omitted it defaults to the basename of the resolved --input directory:

Terminal window
canpy --input ./my-service --emit neo4j --app-name my-service
# the :PyApplication anchor is named "my-service"

The anchor name also scopes every graph mutation, so many applications can live in one database without clobbering each other:

  • The graph.cypher snapshot wipes only (:PyApplication {name: <app>}) and its module subtree before reloading.
  • The Bolt orphan prune on a full run is scoped to (:PyApplication {name: $app})-[:PY_HAS_MODULE]->(:PyModule), so pushing app B never deletes app A’s modules from a shared cluster.

Each graph also carries a schema_version (currently 1.1.0) stamped on its :PyApplication node, and it is the value the CLDK Python SDK matches via application_name to read back exactly this app’s subgraph. Keep --app-name (CLI) and application_name (SDK) identical.

Without --neo4j-uri, canpy writes a self-contained graph.cypher file: the constraints and indexes, a scoped DETACH DELETE of this app’s prior subgraph, then batched UNWIND ... MERGE statements for every node and edge. It needs no extra dependencies and expresses the full truth of the analysis (it is not incremental). With --output, the file lands in that directory; otherwise it is written to the current directory.

Terminal window
canpy --input ./my-service --emit neo4j --app-name my-service --output ./out
# -> ./out/graph.cypher

Load it into Neo4j with cypher-shell:

Terminal window
cypher-shell -u neo4j -p "$NEO4J_PASSWORD" < ./out/graph.cypher

This path is ideal for committing a reproducible snapshot to CI artifacts, seeding a local database, or loading a graph offline with no driver installed.

On a Bolt push, adding --file-name makes the run targeted rather than a full run. A targeted run rewrites only that file’s module and skips orphan pruning — modules for deleted files are not removed. A full run (no --file-name) enables pruning of vanished modules.

Terminal window
# Targeted: re-push one changed file, leave everything else (no pruning)
canpy --input ./my-service --emit neo4j --app-name my-service \
--neo4j-uri bolt://localhost:7687 --file-name src/app/routes.py
# Full run: re-analyze the whole project and prune modules whose files are gone
canpy --input ./my-service --emit neo4j --app-name my-service \
--neo4j-uri bolt://localhost:7687

Once the graph is populated, the CLDK Python SDK can read it back without re-analyzing — no JDK, no native binary, and no project source on the consumer. The graph is produced once, out of band by the canpy --emit neo4j job above; the SDK is a read-only client that only needs the Bolt URI and read-only credentials.

Install the SDK with its driver extra:

Terminal window
pip install 'cldk[neo4j]'

Pass a Neo4jConnectionConfig as the backend. Its application_name must match the --app-name the graph was loaded with:

from cldk import CLDK
from cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.python(
backend=Neo4jConnectionConfig(
uri="bolt://localhost:7687",
username="neo4j",
password="neo4j", # read-only credentials suffice
application_name="my-service", # matches canpy --app-name
),
)
classes = analysis.get_classes() # Dict[str, PyClass]
cg = analysis.get_call_graph() # networkx.DiGraph keyed by callable signatures
for sig, cls in classes.items():
print(sig, list(cls.methods))

The Neo4j backend reconstructs the same typed model objects and the same networkx call graph as the in-process analyzer: get_symbol_table(), get_call_graph(), get_modules(), get_classes(), get_methods(), get_callers(), get_callees(), get_imports() all return the identical PyClass / PyCallable models. The backend is a context manager (with, and .close() to release the driver), and because the graph is external, project_path is optional.

When emitting JSON, the default serialization is json. Pass --format msgpack (-f) for a gzip-compressed MessagePack artifact — smaller and faster to load for large projects:

Terminal window
canpy --input ./my-python-project --output ./out --format msgpack
# -> ./out/analysis.msgpack

The CLI logs the compression ratio relative to JSON when it writes msgpack. The schema is identical across formats; only the serialization differs.

By default the call graph comes from Jedi’s lexical analysis. Add --codeql to resolve additional edges — including RPC, third-party, and dynamically-dispatched targets — and merge them with the Jedi edges. CodeQL also backfills resolved callees on Jedi call sites it couldn’t resolve.

Terminal window
canpy --input ./my-python-project --codeql

Analysis is lazy by default: canpy caches results under .codeanalyzer/ and reuses the entries for files that haven’t changed (detected by mtime, size, and content hash). Pass --eager to rebuild everything from scratch:

Terminal window
# Lazy (default) — reuse unchanged files from cache
canpy --input ./my-python-project
# Eager — rebuild the analysis and the virtual environment
canpy --input ./my-python-project --eager

Control where the cache lives with --cache-dir (-c). If unset, it defaults to .codeanalyzer in the input project directory:

Terminal window
canpy --input ./my-python-project --cache-dir /tmp/ca-cache
# -> /tmp/ca-cache/.codeanalyzer

By default the cache is kept after a run. Pass --clear-cache to delete it on exit (useful in CI):

Terminal window
canpy --input ./my-python-project --clear-cache

To analyze one file rather than the whole project, pass --file-name relative to --input:

Terminal window
canpy --input ./my-python-project --file-name src/app/routes.py

The path must exist under --input and end in .py. As noted above, on a Bolt push --file-name also makes the run targeted and skips orphan pruning.

By default canpy builds a per-project analysis virtualenv (now provisioned with uv — parallel downloads and a shared cache, falling back to pip) and wires it to Jedi for import resolution. Pass --no-venv to skip venv creation and dependency installation entirely, resolving imports against the ambient Python interpreter instead:

Terminal window
canpy --input ./my-python-project --no-venv

This is useful in CI, containers, and sandboxed runs where the dependencies are already installed in the environment and building a fresh venv is wasted work. The default (--venv) builds the per-project environment.

Test files are skipped by default — any file under a test/tests directory, or named test_*.py / *_test.py. Include them with --include-tests:

Terminal window
canpy --input ./my-python-project --include-tests

For large projects, --ray distributes symbol-table construction across workers:

Terminal window
canpy --input ./large-project --ray

The tool is quiet by default. Stack -v for progressively more logging:

Terminal window
canpy --input ./my-python-project -v # info
canpy --input ./my-python-project -vv # debug
canpy --input ./my-python-project -vvv # trace

A typical CI invocation — eager rebuild, CodeQL on, msgpack out, cache discarded:

Terminal window
canpy \
--input ./my-python-project \
--output ./artifacts \
--format msgpack \
--codeql \
--eager \
--clear-cache \
-v

And a scheduled graph-population job — an incremental Bolt push into a shared cluster, with the password supplied via the environment:

Terminal window
export NEO4J_URI=bolt://neo4j.internal:7687
export NEO4J_PASSWORD=secret
canpy \
--input ./my-service \
--emit neo4j \
--app-name my-service \
--no-venv \
-v