codeanalyzer-python (`canpy`)

codeanalyzer-python (`canpy`)

A Python static-analysis toolkit — the CLDK backend that emits a canonical symbol table and call graph, as analysis.json or a Neo4j property graph.

canpy is a static analyzer for Python built on Jedi, with optional CodeQL-resolved call edges and Tree-sitter parsing. It produces the canonical CodeLLM-DevKit (CLDK) analysis.json — a symbol table plus a call graph — and can project that same analysis into a Neo4j property graph. It is the Python backend behind CLDK, mirroring its TypeScript (cants) and Java siblings.

Every run produces a symbol table and a call graph. Edges come from Jedi's lexical resolution by default; --codeql resolves additional edges (RPC / third-party / dynamically-dispatched targets) and merges them with the Jedi-derived edges, also backfilling callees Jedi could not resolve.

Features

Symbol table — modules, classes, functions, methods, variables, decorators, imports, and docstrings, with precise source spans.
Call graph — Jedi's lexical resolver by default, with optional CodeQL-resolved edges (--codeql) for RPC / third-party / dynamically-dispatched targets, merged with the Jedi edges; CodeQL also backfills callees Jedi could not resolve.
Neo4j output — project the analysis into a labeled property graph: a self-contained graph.cypher snapshot, or an incremental push to a live database over Bolt.
Versioned schema — a machine-readable, version-stamped Neo4j schema contract (--emit schema), checked in as schema.neo4j.json and shipped with every release.
Incremental cache — per-file results are cached under .codeanalyzer; --lazy (default) reuses them, --eager forces a clean rebuild. --ray distributes the work across cores.
Compact output — canonical analysis.json, or binary analysis.msgpack for smaller artifacts.

Installation

Prerequisites

Python 3.10 or newer.

A C toolchain and the venv / development headers — the analyzer builds an isolated virtual environment per project (via Python's venv) so Jedi can resolve types and imports:

# Ubuntu / Debian
sudo apt install python3-venv python3-dev build-essential

# Fedora / RHEL / CentOS
sudo dnf group install "Development Tools" && sudo dnf install python3-venv python3-devel

# macOS
xcode-select --install

Install via pip (PyPI)

pip install codeanalyzer-python
canpy --help

For the optional live Neo4j push (--emit neo4j --neo4j-uri …), install the neo4j extra:

pip install 'codeanalyzer-python[neo4j]'

Install via shell script

Install the CLI as an isolated tool with the one-line installer (provisions via uv / pipx / pip):

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/codellm-devkit/codeanalyzer-python/releases/latest/download/canpy-installer.sh | sh

Install via Homebrew

brew install codellm-devkit/tap/codeanalyzer-python

The formula depends on uv and installs canpy as an isolated, version-pinned uv tool (the package and its dependencies are resolved and cached on first run).

Build from source

This project uses uv for dependency management.

git clone https://github.com/codellm-devkit/codeanalyzer-python
cd codeanalyzer-python
uv sync --all-groups
uv run canpy --help

Usage

canpy --input /path/to/python/project

With no --output, the analysis is printed to stdout as compact JSON; with --output <dir> it is written to analysis.json (or graph.cypher for --emit neo4j, or analysis.msgpack with --format msgpack) in that directory.

Options

$ canpy --help

 Usage: canpy [OPTIONS] COMMAND [ARGS]...

 Static Analysis on Python source code using Jedi, CodeQL and Tree sitter.

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ --input           -i                     PATH                 Path to the project root directory │
│                                                               (not required for --emit schema).  │
│ --output          -o                     PATH                 Output directory for artifacts.    │
│ --format          -f                     [json|msgpack]       Output format for --emit json:     │
│                                                               json or msgpack.                   │
│                                                               [default: json]                    │
│ --emit                                   [json|neo4j|schema]  Output target: json                │
│                                                               (analysis.json, default) | neo4j   │
│                                                               (graph.cypher or live Bolt push) | │
│                                                               schema (the Neo4j schema.json      │
│                                                               contract).                         │
│                                                               [default: json]                    │
│ --app-name                               TEXT                 Logical application name for the   │
│                                                               graph :PyApplication anchor        │
│                                                               (default: input dir name).         │
│ --neo4j-uri                              TEXT                 Push the graph to a live Neo4j     │
│                                                               over Bolt (incremental); omit to   │
│                                                               write graph.cypher.                │
│                                                               [env var: NEO4J_URI]               │
│ --neo4j-user                             TEXT                 Neo4j username.                    │
│                                                               [env var: NEO4J_USERNAME]          │
│                                                               [default: neo4j]                   │
│ --neo4j-password                         TEXT                 Neo4j password. Prefer the env var │
│                                                               over the flag (the flag is visible │
│                                                               in shell history / process list).  │
│                                                               [env var: NEO4J_PASSWORD]          │
│                                                               [default: neo4j]                   │
│ --neo4j-database                         TEXT                 Neo4j database name (default:      │
│                                                               server default).                   │
│                                                               [env var: NEO4J_DATABASE]          │
│ --codeql              --no-codeql                             Enable CodeQL-based analysis.      │
│                                                               [default: no-codeql]               │
│ --ray                 --no-ray                                Enable Ray for distributed         │
│                                                               analysis.                          │
│                                                               [default: no-ray]                  │
│ --eager               --lazy                                  Enable eager or lazy analysis.     │
│                                                               Defaults to lazy.                  │
│                                                               [default: lazy]                    │
│ --skip-tests          --include-tests                         Skip test files in analysis.       │
│                                                               [default: skip-tests]              │
│ --no-venv             --venv                                  Skip virtualenv creation and       │
│                                                               dependency installation; resolve   │
│                                                               imports against the ambient Python │
│                                                               environment instead.               │
│                                                               [default: venv]                    │
│ --file-name                              PATH                 Analyze only the specified file    │
│                                                               (relative to input directory).     │
│ --cache-dir       -c                     PATH                 Directory to store analysis cache. │
│                                                               Defaults to '.codeanalyzer' in the │
│                                                               input directory.                   │
│ --clear-cache         --keep-cache                            Clear cache after analysis. By     │
│                                                               default, cache is retained.        │
│                                                               [default: keep-cache]              │
│                   -v                     INTEGER              Increase verbosity: -v, -vv, -vvv  │
│                                                               [default: 0]                       │
│ --help                                                        Show this message and exit.        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Examples

Basic analysis to stdout, or to a file:

canpy --input ./my-python-project                        # compact JSON on stdout
canpy --input ./my-python-project --output ./out         # → ./out/analysis.json

Binary output (msgpack):

canpy --input ./my-python-project --output ./out --format msgpack   # → ./out/analysis.msgpack

Resolve extra call edges with CodeQL:
```
canpy --input ./my-python-project --codeql
```
By default, edges come from Jedi's lexical analysis. Adding --codeql resolves additional edges (including RPC / third-party / dynamically-dispatched targets) and merges them with the Jedi-derived edges; CodeQL also backfills resolved callees Jedi could not resolve. CodeQL integration is experimental; the CLI is downloaded into <cache_dir>/codeql/ on first use.

Emit a Neo4j snapshot, or push to a live database:

canpy --input ./my-python-project --emit neo4j --output ./out   # → ./out/graph.cypher
canpy --input ./my-python-project --emit neo4j \
  --neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-password secret

Emit the Neo4j schema contract:

canpy --emit schema                   # print schema.json to stdout (no project needed)
canpy --emit schema --output ./out    # → ./out/schema.json

Force a clean rebuild with a custom cache directory:

canpy --input ./my-python-project --eager --cache-dir /path/to/custom-cache

Output targets

canpy builds one analysis in memory and can emit it three ways (--emit):

`analysis.json` (default)

A PyApplication document — the canonical CLDK contract:

{
  "symbol_table": { /* file path → module (classes, functions, variables, imports, …) */ },
  "call_graph":   [ /* CALL_DEP edges: { source, target, weight, provenance } keyed by callable signature */ ]
}

By default this is printed to stdout in JSON; with --output it is written to analysis.json (or analysis.msgpack with --format msgpack, a more compact binary format).

Neo4j graph

--emit neo4j projects the same analysis into a labeled property graph. Every node label is Py-prefixed and every relationship type is PY_-prefixed (e.g. :PyClass, PY_CALLS) so multiple language analyzers can share one database without label or relationship-type collisions. Declarations are keyed by their signature under a shared :PySymbol label; calls, imports, inheritance, decorators, and call sites are relationships:

Without --neo4j-uri — writes a self-contained graph.cypher (constraints + indexes, a scoped wipe, then batched MERGEs). Load it with cypher-shell < graph.cypher. Needs no extra dependencies.
With --neo4j-uri — pushes to a live Neo4j over Bolt incrementally: only modules whose content hash changed are rewritten, and on a full run modules whose source file vanished are pruned. Requires the neo4j extra. Every graph carries a schema_version on its :PyApplication node.

Call-graph endpoints that aren't present in the symbol table (third-party / framework / RPC targets) are materialized as :PyExternal ghost nodes, mirroring the analyzer's own ghost-node behaviour.

The connection options also read from the standard Neo4j environment variables — NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD, NEO4J_DATABASE — when the corresponding flag is omitted (an explicit flag wins). Prefer the env var for the password so it doesn't land in shell history or the process list:

export NEO4J_URI=bolt://localhost:7687
export NEO4J_PASSWORD=secret
canpy -i ./my-project --emit neo4j     # credentials picked up from the environment

Schema contract

--emit schema writes the machine-readable, version-stamped Neo4j schema (schema.json: node labels, relationships, properties, constraints, and indexes). It needs no project and is checked into the repo as schema.neo4j.json and bundled in every release as a GitHub Release asset, so a consumer can validate producer/consumer compatibility without invoking the tool. The shape of the contract matches the codeanalyzer-typescript backend.

A UML of the analysis.json schema (the PyApplication containment tree) is checked in as schema-uml.drawio, and the property-graph schema as neo4j-schema.drawio.

Development

This project uses uv.

uv sync --all-groups
uv run canpy --input /path/to/project           # run from source
uv run canpy --emit schema > schema.neo4j.json  # regenerate the checked-in schema contract
uv run python scripts/update_readme.py          # regenerate the canpy --help block above
uv run pytest                                   # run the test suite

The Neo4j schema-conformance test always runs. The Neo4j bolt integration test spins up a real Neo4j via Testcontainers and is opt-in — it needs a container runtime (Docker or Podman) and is enabled with an environment variable:

RUN_CONTAINER_TESTS=1 uv run pytest test/test_neo4j_bolt.py -s

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github		.github
codeanalyzer		codeanalyzer
docs/assets		docs/assets
packaging		packaging
scripts		scripts
test		test
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
neo4j-schema.drawio		neo4j-schema.drawio
pyproject.toml		pyproject.toml
schema-uml.drawio		schema-uml.drawio
schema.neo4j.json		schema.neo4j.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

codeanalyzer-python (`canpy`)

Table of Contents

Features

Installation

Prerequisites

Install via pip (PyPI)

Install via shell script

Install via Homebrew

Build from source

Usage

Options

Examples

Output targets

`analysis.json` (default)

Neo4j graph

Schema contract

Development

License

About

Uh oh!

Releases 17

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

codeanalyzer-python (canpy)

Table of Contents

Features

Installation

Prerequisites

Install via pip (PyPI)

Install via shell script

Install via Homebrew

Build from source

Usage

Options

Examples

Output targets

analysis.json (default)

Neo4j graph

Schema contract

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

codeanalyzer-python (`canpy`)

`analysis.json` (default)

Packages