Output schema overview

codeanalyzer-java extracts one symbol table and call graph per run, then emits that intermediate representation through one of two targets selected by --emit:

--emit json (the default) — a single analysis.json document (or the same JSON on stdout). Self-contained, loaded whole by its consumer.
--emit neo4j — the same IR projected into a Neo4j property graph: a graph.cypher snapshot, or a live incremental push over Bolt. A queryable, persistent system of record that composes across many applications in one database.

Both targets carry the same model — types, callables, fields, parameters, call sites, variables, enum constants, record components, initialization blocks, CRUD operations, comments, annotations, packages. The graph is a lossless projection of the IR, not a summary of it. This page describes the top-level shape of the JSON document and how it relates to the graph; the sub-pages cover each section in detail.

Top-level shape (`analysis.json`)

{
  "symbol_table": {
    "/absolute/path/to/File.java": { /* JavaCompilationUnit */ }
  },
  "call_graph": [ /* edges — present only at analysis level 2 */ ],
  "version": "2.4.1"
}

Key	Type	When present	Description
`symbol_table`	object	always	Map of absolute file path → compilation unit. See Symbol table.
`call_graph`	array	level 2 only	Caller→callee edges from WALA. See Call graph.
`version`	string	always	The analyzer version that produced this document, e.g. `"2.4.1"`.

Serialization conventions

The JSON is produced by Gson with a fixed configuration:

Field naming: LOWER_CASE_WITH_UNDERSCORES — Java fields like filePath serialize as file_path, callableDeclarations as callable_declarations, and so on.
Nulls preserved: serializeNulls is on, so absent values appear as explicit null rather than being omitted. Consumers can rely on keys existing.
Pretty-printed, with HTML escaping disabled (so <, >, & in code/strings stay literal).

The Neo4j projection

The same compilation units, types, and call edges become first-class nodes and relationships in a Neo4j property graph. Every node label is J-prefixed (:JApplication, :JCompilationUnit, :JType, :JCallable, …) and every relationship type J_-prefixed (:J_HAS_UNIT, :J_DECLARES_TYPE, :J_CALLS, …). The prefix is deliberate: a Java graph shares one Neo4j database with the Python (Py* / PY_*) and TypeScript (TS* / TS_*) backends without colliding, so a polyglot portfolio lives in a single store.

Each run is anchored at a (:JApplication {name, schema_version}) node — the value of --app-name — and every analyzed file hangs off it via (:JApplication)-[:J_HAS_UNIT]->(:JCompilationUnit). Because each application is rooted at its own anchor, many applications coexist side by side and you query across them with Cypher instead of parsing one giant JSON blob per project.

# Live incremental push over Bolt (fat jar) — app-scoped, content-hash diffed
export NEO4J_PASSWORD=secret
codeanalyzer -i /path/to/project -a 2 \
  --emit neo4j --app-name daytrader8 \
  --neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-database neo4j

The two --emit neo4j sub-modes are decided purely by whether a Bolt URI resolved (the --neo4j-uri flag or the NEO4J_URI env var):

No URI → a graph.cypher snapshot. Self-contained and re-runnable: constraints and indexes, a scoped wipe of this application’s prior subgraph, then batched UNWIND … MERGE of nodes and edges. It expresses full truth, so it is not incremental. Load it with cypher-shell < graph.cypher.
URI present → a live incremental Bolt push. The driver reads the database’s current state and updates only what changed: it diffs each compilation unit’s content_hash (a SHA-256 over the unit), replaces only changed units’ subgraphs via idempotent MERGE upserts, and on a full run prunes units whose source file vanished. Shared :JPackage / :JAnnotation nodes are upserted MERGE-only and left intact across apps.

For the full node-label and relationship topology — every property on every node, the gating rules, and the constraints and indexes shipped as DDL — see the dedicated reference:

Neo4j graph schema The complete property-graph contract: J-prefixed node labels, J_-prefixed relationships, properties, and the constraints + fulltext index that ship with it.

Versioning: two contracts, two version fields

The JSON document and the graph each declare their own version, and the two are independent:

Field	Lives on	Identifies
`version`	the root of `analysis.json`	the analyzer that produced the document, e.g. `"2.4.1"`.
`schema_version`	the `:JApplication` node in the graph	the graph schema contract, currently `1.0.0`.

schema_version is stamped on the application anchor of every emitted graph, so any consumer can read it back and confirm the contract before traversing. The machine-readable form of that contract is schema.neo4j.json, which you can publish directly without analyzing a project:

# Print the property-graph schema contract to stdout (no project analysis)
codeanalyzer --emit schema

Pass -o <dir> to write it to <dir>/schema.neo4j.json instead of stdout. The document enumerates every declared node label, relationship type, and property; the projector is tested to never emit an undeclared label, relationship, or property, so the contract and the graph stay in lockstep.

For the JSON document, the version field is the schema identifier: the CLDK Python SDK’s Pydantic models (JApplication, JType, JCallable, …) are locked to a compatible version, and your own consumers should treat version as the schema key.

When the JSON schema changes incompatibly, version bumps. One concrete example: imports changed from bare strings to structured objects ({ path, is_static, is_wildcard }). Tools reading an older analysis.json against newer code — or vice versa — should check the version. See the legacy import schema guard.

Reading either output from the Python SDK

The CLDK Python SDK reconstructs the same typed model objects — JType, JCallable, and a networkx call graph — from either output. Point it at a project directory to run the in-process analyzer, or at a Bolt URI to read a graph that was populated out of band. The graph path needs no JDK, no native binary, and no project source on the consumer — only read-only credentials and the application_name that matches the --app-name the graph was loaded with:

from cldk import CLDK
from cldk.analysis import AnalysisLevel
from cldk.analysis.commons.backend_config import Neo4jConnectionConfig

analysis = CLDK.java(
    analysis_level=AnalysisLevel.call_graph,
    backend=Neo4jConnectionConfig(
        uri="bolt://localhost:7687",
        username="neo4j",
        password="neo4j",
        application_name="daytrader8",  # == --app-name on the producing run
    ),
)
symbol_table = analysis.get_symbol_table()  # Dict[str, JCompilationUnit]
cg = analysis.get_call_graph()              # networkx.DiGraph

See Python SDK (CLDK) for the full read-back API.

The two sections

These describe the JSON document; the same model is mirrored node-for-node in the graph.

Symbol table Compilation units, types, callables, fields, comments, imports — the always-present structural model.

Call graph The caller→callee edge array added at analysis level 2.