Skip to content

Output schema overview

codeanalyzer-java extracts one symbol table and call graph per run, then emits that intermediate representation through one of two targets selected by --emit:

  • --emit json (the default) — a single analysis.json document (or the same JSON on stdout). Self-contained, loaded whole by its consumer.
  • --emit neo4j — the same IR projected into a Neo4j property graph: a graph.cypher snapshot, or a live incremental push over Bolt. A queryable, persistent system of record that composes across many applications in one database.

Both targets carry the same model — types, callables, fields, parameters, call sites, variables, enum constants, record components, initialization blocks, CRUD operations, comments, annotations, packages. The graph is a lossless projection of the IR, not a summary of it. This page describes the top-level shape of the JSON document and how it relates to the graph; the sub-pages cover each section in detail.

{
"symbol_table": {
"/absolute/path/to/File.java": { /* JavaCompilationUnit */ }
},
"call_graph": [ /* edges — present only at analysis level 2 */ ],
"version": "2.4.1"
}
KeyTypeWhen presentDescription
symbol_tableobjectalwaysMap of absolute file path → compilation unit. See Symbol table.
call_grapharraylevel 2 onlyCaller→callee edges from WALA. See Call graph.
versionstringalwaysThe analyzer version that produced this document, e.g. "2.4.1".

The JSON is produced by Gson with a fixed configuration:

  • Field naming: LOWER_CASE_WITH_UNDERSCORES — Java fields like filePath serialize as file_path, callableDeclarations as callable_declarations, and so on.
  • Nulls preserved: serializeNulls is on, so absent values appear as explicit null rather than being omitted. Consumers can rely on keys existing.
  • Pretty-printed, with HTML escaping disabled (so <, >, & in code/strings stay literal).

The same compilation units, types, and call edges become first-class nodes and relationships in a Neo4j property graph. Every node label is J-prefixed (:JApplication, :JCompilationUnit, :JType, :JCallable, …) and every relationship type J_-prefixed (:J_HAS_UNIT, :J_DECLARES_TYPE, :J_CALLS, …). The prefix is deliberate: a Java graph shares one Neo4j database with the Python (Py* / PY_*) and TypeScript (TS* / TS_*) backends without colliding, so a polyglot portfolio lives in a single store.

Each run is anchored at a (:JApplication {name, schema_version}) node — the value of --app-name — and every analyzed file hangs off it via (:JApplication)-[:J_HAS_UNIT]->(:JCompilationUnit). Because each application is rooted at its own anchor, many applications coexist side by side and you query across them with Cypher instead of parsing one giant JSON blob per project.

Terminal window
# Live incremental push over Bolt (fat jar) — app-scoped, content-hash diffed
export NEO4J_PASSWORD=secret
codeanalyzer -i /path/to/project -a 2 \
--emit neo4j --app-name daytrader8 \
--neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-database neo4j

The two --emit neo4j sub-modes are decided purely by whether a Bolt URI resolved (the --neo4j-uri flag or the NEO4J_URI env var):

  • No URI → a graph.cypher snapshot. Self-contained and re-runnable: constraints and indexes, a scoped wipe of this application’s prior subgraph, then batched UNWIND … MERGE of nodes and edges. It expresses full truth, so it is not incremental. Load it with cypher-shell < graph.cypher.
  • URI present → a live incremental Bolt push. The driver reads the database’s current state and updates only what changed: it diffs each compilation unit’s content_hash (a SHA-256 over the unit), replaces only changed units’ subgraphs via idempotent MERGE upserts, and on a full run prunes units whose source file vanished. Shared :JPackage / :JAnnotation nodes are upserted MERGE-only and left intact across apps.

For the full node-label and relationship topology — every property on every node, the gating rules, and the constraints and indexes shipped as DDL — see the dedicated reference:

Versioning: two contracts, two version fields

Section titled “Versioning: two contracts, two version fields”

The JSON document and the graph each declare their own version, and the two are independent:

FieldLives onIdentifies
versionthe root of analysis.jsonthe analyzer that produced the document, e.g. "2.4.1".
schema_versionthe :JApplication node in the graphthe graph schema contract, currently 1.0.0.

schema_version is stamped on the application anchor of every emitted graph, so any consumer can read it back and confirm the contract before traversing. The machine-readable form of that contract is schema.neo4j.json, which you can publish directly without analyzing a project:

Terminal window
# Print the property-graph schema contract to stdout (no project analysis)
codeanalyzer --emit schema

Pass -o <dir> to write it to <dir>/schema.neo4j.json instead of stdout. The document enumerates every declared node label, relationship type, and property; the projector is tested to never emit an undeclared label, relationship, or property, so the contract and the graph stay in lockstep.

For the JSON document, the version field is the schema identifier: the CLDK Python SDK’s Pydantic models (JApplication, JType, JCallable, …) are locked to a compatible version, and your own consumers should treat version as the schema key.

When the JSON schema changes incompatibly, version bumps. One concrete example: imports changed from bare strings to structured objects ({ path, is_static, is_wildcard }). Tools reading an older analysis.json against newer code — or vice versa — should check the version. See the legacy import schema guard.

The CLDK Python SDK reconstructs the same typed model objectsJType, JCallable, and a networkx call graph — from either output. Point it at a project directory to run the in-process analyzer, or at a Bolt URI to read a graph that was populated out of band. The graph path needs no JDK, no native binary, and no project source on the consumer — only read-only credentials and the application_name that matches the --app-name the graph was loaded with:

from cldk import CLDK
from cldk.analysis import AnalysisLevel
from cldk.analysis.commons.backend_config import Neo4jConnectionConfig
analysis = CLDK.java(
analysis_level=AnalysisLevel.call_graph,
backend=Neo4jConnectionConfig(
uri="bolt://localhost:7687",
username="neo4j",
password="neo4j",
application_name="daytrader8", # == --app-name on the producing run
),
)
symbol_table = analysis.get_symbol_table() # Dict[str, JCompilationUnit]
cg = analysis.get_call_graph() # networkx.DiGraph

See Python SDK (CLDK) for the full read-back API.

These describe the JSON document; the same model is mirrored node-for-node in the graph.