Output schema
canpy builds one analysis in memory and can serialize it two ways. The default is a single PyApplication artifact — analysis.json (or msgpack). With --emit neo4j the same in-memory PyApplication is projected into a labeled property graph instead of a file. This page is the schema reference for both: the JSON model and the Neo4j property graph.
flowchart TB
SRC["Python project (--input)"] --> IR["in-memory PyApplication"]
IR -->|"--emit json (default)"| JSON["analysis.json / msgpack"]
IR -->|"--emit neo4j"| PG["labeled property graph"]
IR -->|"--emit schema"| CONTRACT["schema.json (versioned contract)"]
JSON --> APP[PyApplication]
APP --> ST["symbol_table: {path: PyModule}"]
APP --> CG["call_graph: [PyCallEdge]"]
APP --> EP["entrypoints: {framework: [PyEntrypoint]}"]
ST --> MOD[PyModule]
MOD --> CLS[PyClass]
MOD --> FN[PyCallable]
CLS --> M[PyCallable]
CLS --> ATTR[PyClassAttribute]
FN --> CALL[PyCallsite]
FN --> PARAM[PyCallableParameter]
FN --> DEC[PyDecorator]
PG -->|"no --neo4j-uri"| SNAP["graph.cypher snapshot"]
PG -->|"--neo4j-uri (Bolt)"| LIVE["live Neo4j, incremental"]
SNAP --> NODES[":PyApplication / :PyModule / :PyClass / :PyCallable …"]
LIVE --> NODES
NODES -->|"PY_HAS_MODULE / PY_DECLARES / PY_CALLS …"| NODES
The JSON model (PyApplication)
Section titled “The JSON model (PyApplication)”The default artifact is a single PyApplication. Every model below is a Pydantic model defined in codeanalyzer.schema.py_schema; the JSON and msgpack outputs are serializations of the same schema. Line/column fields default to -1 when unknown.
PyApplication
Section titled “PyApplication”The root object.
| Field | Type | Description |
|---|---|---|
symbol_table | Dict[str, PyModule] | File path → module model. The whole-project inventory. |
call_graph | List[PyCallEdge] | Identity-keyed call edges. |
entrypoints | Dict[str, List[PyEntrypoint]] | Framework name → detected roots. |
PyModule
Section titled “PyModule”One per source file.
| Field | Type | Description |
|---|---|---|
file_path | str | Absolute path to the file. |
module_name | str | Dotted module name. |
imports | List[PyImport] | Import statements. |
comments | List[PyComment] | Comments and docstrings. |
classes | Dict[str, PyClass] | Top-level classes by name. |
functions | Dict[str, PyCallable] | Top-level functions by name. |
variables | List[PyVariableDeclaration] | Module-level variables. |
content_hash, last_modified, file_size | str / float / int | Cache-invalidation metadata. |
PyClass
Section titled “PyClass”| Field | Type | Description |
|---|---|---|
name | str | Class short name. |
signature | str | Fully-qualified identity (e.g. module.ClassName). |
base_classes | List[str] | Names of base classes. |
decorators | List[PyDecorator] | Class decorators. |
methods | Dict[str, PyCallable] | Methods by name. |
attributes | Dict[str, PyClassAttribute] | Class attributes by name. |
inner_classes | Dict[str, PyClass] | Nested classes. |
comments, code | List[PyComment] / str | Docstrings/comments and source. |
start_line, end_line | int | Source span. |
PyCallable
Section titled “PyCallable”A function or method. The richest model in the artifact.
| Field | Type | Description |
|---|---|---|
name | str | Callable short name. |
path | str | File the callable is defined in. |
signature | str | Fully-qualified identity (e.g. module.Class.method). The call-graph node key. |
parameters | List[PyCallableParameter] | Declared parameters. |
return_type | Optional[str] | Resolved return type, if known. |
decorators | List[PyDecorator] | Applied decorators. |
code | Optional[str] | The source body. |
call_sites | List[PyCallsite] | Calls made from this callable. |
accessed_symbols | List[PySymbol] | Symbols read/written in the body. |
local_variables | List[PyVariableDeclaration] | Locals. |
inner_callables, inner_classes | Dict[str, ...] | Nested definitions. |
cyclomatic_complexity | int | Computed complexity. |
is_entrypoint | bool | Whether a finder marked this an entrypoint. |
entrypoint_framework | Optional[str] | The framework, if so. |
start_line, end_line, code_start_line | int | Source spans. |
PyCallsite
Section titled “PyCallsite”A single call made from within a callable — the rich per-call metadata behind a graph edge.
| Field | Type | Description |
|---|---|---|
method_name | str | The invoked name as written. |
receiver_expr, receiver_type | Optional[str] | The receiver expression and its resolved type. |
argument_types | List[str] | Resolved argument types. |
return_type | Optional[str] | Resolved return type. |
callee_signature | Optional[str] | The resolved target’s signature (CodeQL may backfill this). |
is_constructor_call | bool | Whether the call constructs an instance. |
start_line, end_line, … | int | Source location. |
PyCallEdge
Section titled “PyCallEdge”An identity-only call-graph edge.
| Field | Type | Description |
|---|---|---|
source | str | Caller’s PyCallable.signature. |
target | str | Callee’s PyCallable.signature. |
type | "CALL_DEP" | Edge kind. |
weight | int | Edge weight (default 1). |
provenance | List[str] | Which engine(s) produced it: "jedi", "codeql", or an extension token. Open vocabulary. |
tags | Dict[str, str] | Free-form, extension-namespaced metadata (e.g. an ORM-dispatch trigger predicate). Never interpreted by core. |
PyEntrypoint
Section titled “PyEntrypoint”A framework-dispatched root, referencing a callable by signature.
| Field | Type | Description |
|---|---|---|
signature | str | The PyCallable.signature this entrypoint refers to. |
framework | str | The dispatching framework. |
detection_source | str | How it was detected — decorator, base_class, url_resolver, router_mount, blueprint, lambda_template, typer_subapp, click_add_command, argparse_dispatch, convention, or extension. Open vocabulary. |
route_path, http_methods | Optional[str] / List[str] | For HTTP routes. |
celery_task_name, cli_command_name, lambda_handler_key, grpc_service_name | Optional[str] | Framework-specific identifiers, when applicable. |
source_file | Optional[str] | File declaring the binding (urls.py, template.yaml, …). |
tags | Dict[str, str] | Free-form, namespaced metadata for extensions. |
Supporting models
Section titled “Supporting models”PyImport—module,name,alias, and source span.PyComment—content,is_docstring, and source span.PyDecorator—name, resolvedqualified_name, and rawpositional_arguments/keyword_arguments(source-text fragments for finders to parse).PyCallableParameter—name,type,default_value, source span.PyClassAttribute—name,type, comments, source span.PyVariableDeclaration—name,type,initializer,value,scope.PySymbol— a referenced symbol:name,scope,kind, resolvedtype,qualified_name,is_builtin.
Serialization helpers
Section titled “Serialization helpers”Every model is decorated for MessagePack support, exposing to_msgpack_bytes() / from_msgpack_bytes() (gzip-compressed) and to_msgpack_dict() / from_msgpack_dict(). PyApplication additionally exposes get_compression_ratio(). For JSON, use the Pydantic v1/v2 compatibility helpers model_dump_json / model_validate_json from codeanalyzer.schema. Models built via the fluent builder pattern — PyApplication.builder().symbol_table(...).call_graph(...).build().
The Neo4j property graph
Section titled “The Neo4j property graph”--emit neo4j projects the same in-memory PyApplication into a labeled property graph instead of a JSON file. Where analysis.json is one self-contained blob you load whole into memory, the graph is a persistent, queryable system of record: many applications can live in one database — each anchored at its own :PyApplication node — so whole-monorepo or cross-service questions become a Cypher traversal rather than parsing giant JSON files. See the CLI reference for how the two writers (the graph.cypher snapshot and the incremental Bolt push) work.
Every node label is Py-prefixed and every relationship type is PY_-prefixed (e.g. :PyClass, PY_CALLS), so the Java, TypeScript, and Python analyzers can share one database without label or relationship-type collisions. Declarations — classes, callables, and external symbols — are keyed by their signature and merged under a shared :PySymbol label, which is what makes the identity invariant cheap to enforce and cross-module references stable. The labels, relationships, and properties below are generated from codeanalyzer/neo4j/catalog.py and published verbatim as the machine-readable schema contract.
Node labels
Section titled “Node labels”The key is the property the node is MERGEd on. Declaration nodes (:PyClass, :PyCallable, :PyExternal) carry the extra :PySymbol label and are merged on signature.
| Label | Merge label | Key | Notable properties |
|---|---|---|---|
:PyApplication | :PyApplication | name | schema_version — the application anchor, named by --app-name. |
:PyModule | :PyModule | file_key | module_name, content_hash, last_modified, file_size. |
:PyClass | :PySymbol | signature | name, code, base_classes, docstring, start_line, end_line. |
:PyCallable | :PySymbol | signature | name, path, return_type, cyclomatic_complexity, code, code_start_line, start_line. |
:PyExternal | :PySymbol | signature | name, module — a ghost node for a third-party / unresolved target, mirroring the JSON call graph’s ghost-node behavior. |
:PyPackage | :PyPackage | name | An imported package, shared across modules and applications. |
:PyDecorator | :PyDecorator | name | A decorator, shared across callables and applications. |
:PyCallSite | :PyCallSite | id | method_name, receiver_expr, receiver_type, argument_types, return_type, callee_signature, is_constructor_call. |
:PyAttribute | :PyAttribute | id | name, type, docstring, start_line, end_line. |
:PyVariable | :PyVariable | id | name, type, initializer, scope, start_line, end_line. |
Relationship types
Section titled “Relationship types”| Relationship | Endpoints | Notes |
|---|---|---|
PY_HAS_MODULE | (:PyApplication)-[]->(:PyModule) | The application anchor contains each analyzed source module. |
PY_DECLARES | (:PyModule|PyClass|PyCallable)-[]->(:PyClass|PyCallable) | Declaration containment, recursive: modules declare top-level classes/functions; classes and callables declare nested ones. |
PY_HAS_METHOD | (:PyClass)-[]->(:PyCallable) | A class owns a method callable. |
PY_HAS_ATTRIBUTE | (:PyClass)-[]->(:PyAttribute) | A class owns an attribute. |
PY_DECLARES_VAR | (:PyModule|PyCallable)-[]->(:PyVariable) | A module- or function-scoped variable declaration. |
PY_HAS_CALLSITE | (:PyCallable)-[]->(:PyCallSite) | A callable contains the call sites it makes. |
PY_RESOLVES_TO | (:PyCallSite)-[]->(:PyCallable|PyExternal) | A call site resolves to a concrete callable or an external (ghost) symbol. |
PY_CALLS | (:PyCallable|PyExternal)-[]->(:PyCallable|PyExternal) | The call-graph edge. Properties: weight (integer), provenance (string[], e.g. jedi / codeql / an extension token). |
PY_EXTENDS | (:PyClass)-[]->(:PyClass) | Class inheritance (self-referential). |
PY_IMPORTS | (:PyModule)-[]->(:PyPackage) | A module imports a package. Properties: imported_names (string[]), aliases (string[]). |
PY_DECORATED_BY | (:PyCallable)-[]->(:PyDecorator) | A callable is decorated by a decorator. |
The PY_CALLS edge is the property-graph form of PyCallEdge: the same weight and provenance carry over, and the same optional CodeQL augmentation backfills resolved call edges. PY_RESOLVES_TO preserves the finer per-call-site resolution that PyCallsite records in the JSON model.
graph LR
APP[":PyApplication"] -->|PY_HAS_MODULE| MOD[":PyModule"]
MOD -->|PY_DECLARES| CLS[":PyClass"]
MOD -->|PY_DECLARES| FN[":PyCallable"]
MOD -->|PY_IMPORTS| PKG[":PyPackage"]
MOD -->|PY_DECLARES_VAR| VAR[":PyVariable"]
CLS -->|PY_HAS_METHOD| FN
CLS -->|PY_HAS_ATTRIBUTE| ATTR[":PyAttribute"]
CLS -->|PY_EXTENDS| CLS
FN -->|PY_DECORATED_BY| DEC[":PyDecorator"]
FN -->|PY_HAS_CALLSITE| CS[":PyCallSite"]
FN -->|PY_DECLARES_VAR| VAR
CS -->|PY_RESOLVES_TO| FN
CS -->|PY_RESOLVES_TO| EXT[":PyExternal"]
FN -->|PY_CALLS| FN
FN -->|PY_CALLS| EXT
Constraints and indexes
Section titled “Constraints and indexes”Both writers run the same DDL before any load (it is idempotent — every statement is IF NOT EXISTS) so each MERGE is an index seek rather than a label scan, and the identity invariant is enforced by the database itself.
// Uniqueness constraintsCREATE CONSTRAINT py_symbol_sig IF NOT EXISTS FOR (s:PySymbol) REQUIRE s.signature IS UNIQUE;CREATE CONSTRAINT py_app_name IF NOT EXISTS FOR (a:PyApplication) REQUIRE a.name IS UNIQUE;CREATE CONSTRAINT py_module_key IF NOT EXISTS FOR (m:PyModule) REQUIRE m.file_key IS UNIQUE;CREATE CONSTRAINT py_package_name IF NOT EXISTS FOR (p:PyPackage) REQUIRE p.name IS UNIQUE;CREATE CONSTRAINT py_decorator_name IF NOT EXISTS FOR (d:PyDecorator) REQUIRE d.name IS UNIQUE;CREATE CONSTRAINT py_callsite_id IF NOT EXISTS FOR (c:PyCallSite) REQUIRE c.id IS UNIQUE;CREATE CONSTRAINT py_attribute_id IF NOT EXISTS FOR (a:PyAttribute) REQUIRE a.id IS UNIQUE;CREATE CONSTRAINT py_variable_id IF NOT EXISTS FOR (v:PyVariable) REQUIRE v.id IS UNIQUE;
// Lookup indexesCREATE INDEX py_callable_name IF NOT EXISTS FOR (c:PyCallable) ON (c.name);CREATE INDEX py_class_name IF NOT EXISTS FOR (c:PyClass) ON (c.name);
// Fulltext index for code search over callable bodies and docstringsCREATE FULLTEXT INDEX py_code_fts IF NOT EXISTS FOR (c:PyCallable) ON EACH [c.code, c.docstring];The py_code_fts fulltext index backs code search across everything loaded into the database — query it with db.index.fulltext.queryNodes, then filter to one application by walking back to its anchor:
CALL db.index.fulltext.queryNodes('py_code_fts', 'subprocess AND shell')YIELD node, scoreMATCH (app:PyApplication {name: 'my-service'}) -[:PY_HAS_MODULE]->(:PyModule)-[:PY_DECLARES*1..]->(node)RETURN node.signature AS callable, scoreORDER BY score DESCLIMIT 20;Because every subgraph hangs off its :PyApplication anchor, every query scopes to one application by matching {name: '<app-name>'} — the same value passed as --app-name at emit time. That scoping is also what keeps a shared database multi-tenant: a push for one application only touches its own anchored subtree.
The schema contract
Section titled “The schema contract”--emit schema serializes this catalog — node labels, relationship types, and their property types — to a version-stamped schema.json. It is a static catalog, so no project is required:
# Print the contract to stdout (no project needed)canpy --emit schema
# Or write it to a directorycanpy --emit schema --output ./out # → ./out/schema.jsonThe contract carries SCHEMA_VERSION (currently 1.1.0), the same value stamped onto every graph’s :PyApplication node. It is checked in as schema.neo4j.json and shipped as a GitHub Release asset, so consumers can pin to a version and detect contract changes:
{ "schema_version": "1.1.0", "generator": "codeanalyzer-python", "node_labels": [ { "label": "PyApplication", "merge_label": "PyApplication", "key": "name", "properties": { "name": "string", "schema_version": "string" } } ]}Reading the graph back with CLDK
Section titled “Reading the graph back with CLDK”The CLDK Python SDK has a read-only Neo4j backend that reconstructs these same typed models from the graph — no JDK, no native binary, and no project source on the consumer, only read-only Neo4j credentials. Pass a Neo4jConnectionConfig whose application_name matches the --app-name the graph was loaded with, and CLDK.python rebuilds the same PyClass / PyCallable objects and the same networkx call graph the in-process analyzer would produce:
from cldk import CLDKfrom cldk.analysis.commons.backend_config import Neo4jConnectionConfig
# The graph is populated out of band by `canpy --emit neo4j`; the SDK only reads it.analysis = CLDK.python( backend=Neo4jConnectionConfig( uri="bolt://localhost:7687", username="neo4j", password="neo4j", application_name="my-service", # matches canpy --app-name ),)
classes = analysis.get_classes() # Dict[str, PyClass]cg = analysis.get_call_graph() # networkx.DiGraph keyed by callable signaturesThe SDK’s neo4j driver is an optional extra (pip install cldk[neo4j]). See the Neo4j guide for the full read API, and the CLI reference for how producers and consumers split.