Skip to content

CLI options

canpy [OPTIONS]

Static analysis on Python source code using Jedi, CodeQL, and Tree-sitter. canpy builds one analysis in memory and emits it to one of three targets selected by --emit: the default analysis.json (symbol table + call graph), a Neo4j property graph, or the version-stamped Neo4j schema contract.

OptionAliasTypeDefaultDescription
--input-iPATHNonePath to the project root directory. Required for --emit json and --emit neo4j; not required for --emit schema.
--output-oPATHNoneOutput directory for artifacts. Behavior depends on --emit (see Output files).
--format-fjson | msgpackjsonOutput serialization for --emit json.
--emitjson | neo4j | schemajsonOutput target. jsonanalysis.json; neo4jgraph.cypher snapshot or a live Bolt push; schema → the Neo4j schema.json contract.
--app-nameTEXTinput dir nameLogical application name for the graph’s :PyApplication anchor. Defaults to the resolved basename of --input.
--neo4j-uriTEXTNonePush the graph to a live Neo4j over Bolt (incremental). Omit to write graph.cypher. Reads NEO4J_URI. Requires the neo4j extra.
--neo4j-userTEXTneo4jNeo4j username. Reads NEO4J_USERNAME.
--neo4j-passwordTEXTneo4jNeo4j password. Reads NEO4J_PASSWORD. Prefer the env var — the flag is visible in shell history and the process list.
--neo4j-databaseTEXTNoneNeo4j database name. None uses the server default. Reads NEO4J_DATABASE.
--codeql / --no-codeqlflag--no-codeqlEnable CodeQL-based call-graph resolution in addition to Jedi.
--ray / --no-rayflag--no-rayUse Ray to build the symbol table in parallel.
--eager / --lazyflag--lazyRebuild the analysis (and venv) from scratch vs. reuse cache.
--skip-tests / --include-testsflag--skip-testsExclude or include test files in the analysis.
--no-venv / --venvflag--venvResolve imports against the ambient interpreter instead of building a per-project venv. Useful in CI, containers, and sandboxed runs.
--file-namePATHNoneAnalyze only this file (relative to --input; must be .py).
--cache-dir-cPATHNoneWhere to store the cache. Defaults to .codeanalyzer in the input dir.
--clear-cache / --keep-cacheflag--keep-cacheDelete the cache on exit vs. retain it.
-vcount0Increase verbosity: -v (info), -vv (debug), -vvv (trace).
--helpShow the help message and exit.
  • Lazy by default. Analysis reuses cached results for unchanged files. Use --eager to force a full rebuild.
  • Cache is kept by default. The cache survives between runs. Use --clear-cache to discard it on exit.
  • Tests excluded by default. Files under test/tests directories, or named test_*.py / *_test.py, are skipped unless you pass --include-tests.
  • CodeQL off by default. Jedi resolves the call graph alone unless --codeql is set; CodeQL augments it.
  • A venv is built by default. canpy provisions a per-project analysis venv (built with uv, falling back to pip) and wires it to Jedi for import resolution. Pass --no-venv to skip it and resolve against the ambient interpreter.

What canpy produces depends on --emit:

--emitWith --output ./outWithout --output
json (default)./out/analysis.json (or analysis.msgpack with --format msgpack)compact JSON on stdout
neo4j (no --neo4j-uri)./out/graph.cyphergraph.cypher in the current directory
neo4j (with --neo4j-uri)live Bolt push — --output is unusedlive Bolt push
schema./out/schema.jsonschema.json on stdout

For --emit json, --output names a directory, not a file — canpy writes analysis.json (or analysis.msgpack) inside it, creating the directory if needed. The json and msgpack formats encode the same PyApplication schema; see Output schema. The msgpack form is gzip-compressed MessagePack, and the CLI logs the compression ratio vs. JSON.

--emit chooses where the single in-memory analysis goes. The analysis is built once; the target only decides how it is projected.

The canonical symbol table and call graph as one JSON document. This is the in-process artifact every other tool has always consumed.

Terminal window
# Symbol table + call graph to stdout
canpy --input ./my-python-project
# Write JSON to a directory
canpy --input ./my-python-project --output ./out
# msgpack + CodeQL, eager rebuild, cache discarded
canpy --input ./my-python-project --output ./out --format msgpack --codeql --eager --clear-cache

Projects the same in-memory PyApplication into a labeled property graph. Every node label is Py-prefixed and every relationship type is PY_-prefixed (e.g. :PyClass, PY_CALLS), so Java, TypeScript, and Python analyzers can share one database without label or relationship-type collisions. Declarations are keyed by their signature under a shared :PySymbol label, and the application is anchored at a single :PyApplication node named by --app-name. The full topology is documented in the graph schema reference.

There are two sub-modes, decided solely by whether --neo4j-uri is set.

Without --neo4j-uri — a self-contained graph.cypher snapshot. Constraints and indexes, a scoped DETACH DELETE wipe of this app’s prior subtree, then batched UNWIND … MERGE for nodes and edges. It needs no extra dependencies and expresses the full truth of the analysis — it is not incremental. Load it with cypher-shell:

Terminal window
canpy --input ./my-python-project --emit neo4j --app-name my-service --output ./out
cypher-shell < ./out/graph.cypher

With --neo4j-uri — an incremental live Bolt push. Ensures the DDL, diffs each module’s content_hash against the database, and rewrites only modules whose content changed. Shared :PyExternal / :PyPackage / :PyDecorator nodes are MERGE-only and nodes are never blindly deleted, so cross-module references survive. On a full run (no --file-name), modules whose source file vanished are pruned — scoped to this app’s :PyApplication anchor, so pushing app B never deletes app A’s modules from a shared database. Every graph carries a schema_version on its :PyApplication node (currently 1.1.0).

Prefer the environment for the password so it never lands in shell history or the process list:

Terminal window
export NEO4J_URI=bolt://localhost:7687
export NEO4J_PASSWORD=secret
canpy --input ./my-python-project --emit neo4j --app-name my-service

The same run spelled out with flags:

Terminal window
canpy --input ./my-python-project --emit neo4j --app-name my-service \
--neo4j-uri bolt://localhost:7687 --neo4j-user neo4j
# NEO4J_PASSWORD is read from the environment

The live Bolt path needs the optional neo4j driver extra. If it is missing, canpy raises a clear error telling you to install it:

Terminal window
pip install 'codeanalyzer-python[neo4j]'

The snapshot (graph.cypher) and --emit schema modes need nothing extra.

Emits the machine-readable, version-stamped Neo4j schema (schema.json: node labels, relationship types, and their properties). It is a static catalog, so no project is required — --input is optional here.

Terminal window
# Print the schema contract to stdout (no project needed)
canpy --emit schema
# Write it to a directory
canpy --emit schema --output ./out # → ./out/schema.json

The schema carries SCHEMA_VERSION (currently 1.1.0), the same value stamped onto every graph’s :PyApplication node. Pin to it so consumers can detect contract changes.

Once a graph is in Neo4j, the CLDK Python SDK reads it directly — no JDK, no native binary, and no project source on the consumer, only read-only Neo4j credentials. Point CLDK.python at a Neo4jConnectionConfig whose application_name matches the --app-name the graph was loaded with, and it reconstructs the same typed PyApplication model and the same networkx call graph the in-process analyzer would build.

from cldk import CLDK
from cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.python(
backend=Neo4jConnectionConfig(
uri="bolt://localhost:7687",
username="neo4j",
password="neo4j",
application_name="my-service", # matches canpy --app-name
),
)
classes = analysis.get_classes() # Dict[str, PyClass]
cg = analysis.get_call_graph() # networkx.DiGraph keyed by callable signatures

The SDK’s neo4j driver is an optional extra: pip install cldk[neo4j]. See the Neo4j guide for the full read API.

Terminal window
# Symbol table + call graph to stdout
canpy --input ./proj
# Write JSON to a directory
canpy --input ./proj --output ./out
# msgpack + CodeQL, eager rebuild, cache discarded
canpy --input ./proj --output ./out --format msgpack --codeql --eager --clear-cache
# Property-graph snapshot to ./out/graph.cypher
canpy --input ./proj --emit neo4j --app-name proj --output ./out
# Incremental Bolt push (password from the environment)
NEO4J_PASSWORD=secret canpy --input ./proj --emit neo4j --app-name proj \
--neo4j-uri bolt://localhost:7687
# Targeted single-file push (skips orphan pruning)
NEO4J_PASSWORD=secret canpy --input ./proj --emit neo4j --app-name proj \
--neo4j-uri bolt://localhost:7687 --file-name src/app/routes.py
# Publish the version-stamped schema contract
canpy --emit schema --output ./out
# Ambient interpreter, no per-project venv (CI / containers)
canpy --input ./proj --no-venv --output ./out
# Custom cache location, debug logging
canpy --input ./proj --cache-dir /tmp/ca -vv