Skip to content

--emit neo4j: J_CALLS materializes only ~14% of the call graph (edges absent though both endpoint nodes exist) #158

Description

@rahlk

Summary

--emit neo4j materializes only a small fraction of the call graph: on daytrader8 the emitted graph has 287 J_CALLS edges, while the analysis.json call_graph from an equivalent level-2 run has ~2077 edges. Edges are missing even when both endpoint :JCallable nodes exist in the graph, so it is not (only) the external-target gating.

Environment

  • codeanalyzer-java 2.4.0 (--analysis-level 2, live Bolt push)

Evidence

  • The call graph is deterministic: two --emit json runs gave 2077 and 2076 call_graph edges.
  • The same project --emit neo4j produced 287 J_CALLS edges.
  • Of 1855 fully-resolved edges in analysis.json (both endpoints are app callables present in the symbol table), only ~287 unique have a J_CALLS edge in the graph; ~1523 are absent despite both endpoint :JCallable nodes existing:
# for an absent edge, both nodes resolve:
MATCH (a:JCallable {id:$sid}) RETURN a   // exists
MATCH (b:JCallable {id:$tid}) RETURN b   // exists
MATCH (:JCallable {id:$sid})-[r:J_CALLS]->(:JCallable {id:$tid}) RETURN count(r)  // 0

Example absent edges (both endpoints present as nodes):
...web.websocket.ActionMessage#... -> ...util.Log#error(...),
...web.prims.PingWebSocketJson#... -> ...web.websocket.JsonMessage#....

Likely area

neo4j/GraphProjector.java, projectCallGraph
b.edgeIfBothResolved("J_CALLS", new NodeRef("JSymbol","id",from), new NodeRef("JSymbol","id",to), props)
with from/to = type_declaration + "#" + signature. Since both endpoint nodes exist with exactly those ids, edgeIfBothResolved should retain them — so either the resolution key used during projection differs from the node id actually emitted (a normalization mismatch between the call_graph vertices' type_declaration/signature and the symbol-table ids), or the --emit neo4j path is projecting a much smaller call graph than --emit json. Worth confirming whether the two emit paths build the SDG identically.

Impact

Any consumer reading the call graph from Neo4j (e.g. the CLDK SDK's read-only Java backend) sees ~14% of the call edges the in-memory / analysis.json backend reports — callers/callees, class call graphs, and reachability are all substantially incomplete.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions