Skip to content

feat(neo4j): J-namespaced, lossless Neo4j graph output (#154)#155

Merged
rahlk merged 3 commits into
mainfrom
feature/issue-154-neo4j-and-fix-153
Jun 23, 2026
Merged

feat(neo4j): J-namespaced, lossless Neo4j graph output (#154)#155
rahlk merged 3 commits into
mainfrom
feature/issue-154-neo4j-and-fix-153

Conversation

@rahlk

@rahlk rahlk commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Closes #154. Also carries the #153 native-image fix (already merged-as-closed).

What this does

Brings the Java Neo4j backend (--emit neo4j) to parity with the Python/TS siblings and makes it a lossless projection of the analysis IR.

Namespacing (shared-DB safe)

  • All node labels J-prefixed, rel types J_-prefixed, constraint/index names j_-prefixed.
  • Provenance prop _unit_module (matches siblings).
  • --emit schemaschema.neo4j.json; NEO4J_URI/USERNAME/PASSWORD/DATABASE env fallback (flag > env > default).

Lossless projection

  • New first-class nodes :JInitializationBlock, :JCrudOperation, :JCrudQuery, :JComment (+ their rels); J_HAS_CALLSITE/J_DECLARES_VAR extended to init blocks.
  • Added every previously-dropped scalar prop (is_modified, file_path, variable_initializers_json, default_value, argument_expr, is_unspecified, parameter/variable columns, docstrings, J_CALLS source_kind/destination_kind).
  • CypherWriter.DESCENDANTS extended so the scoped wipe / orphan-prune reach the new containment edges.

Driver-free Bolt seam

  • BoltConfig + BoltSink are driver-free core types; BoltWriter is the only class importing org.neo4j.driver.* and is loaded reflectively. The fat jar bundles the driver (live Bolt push works); the GraalVM native image never statically references it and falls back to writing graph.cypher.

Native (#153)

  • Comprehensive JavaParser AST reflection metadata so the native image can analyze projects (was: NoSuchFieldError on every parse).

Verification

  • Schema conformance test 3/3 (16 node labels, 20 rel types, 15 constraints); checked-in schema.neo4j.json matches the catalog.
  • Fat-jar live Bolt push verified; native --emit json/--emit neo4j verified on real apps (daytrader8, plantsbywebsphere).
  • neo4j-schema.drawio refreshed to the J-prefixed schema, with the call graph (J_CALLS / J_RESOLVES_TO) drawn explicitly.

Not included

PyPI/JDK distribution scaffolding was intentionally left out of this PR.

rahlk and others added 2 commits June 20, 2026 04:16
Port the codeanalyzer-typescript 0.4.0 Neo4j feature to Java with the same
arg entrypoints:

  --emit json|neo4j|schema  (default json)
  --app-name, --neo4j-uri, --neo4j-user, --neo4j-password, --neo4j-database

New com.ibm.cldk.neo4j package:
  - GraphProjector: pure projection of the symbol table (+ level-2 call graph)
    to graph rows. Type/Callable share a :Symbol identity; call sites, fields,
    parameters, variables, enum constants, record components are first-class
    nodes; annotations/packages are shared; entrypoints are a marker label;
    every unit-owned node carries a _unit provenance prop.
  - CypherWriter: self-contained graph.cypher snapshot (constraints, scoped
    wipe, batched UNWIND/MERGE).
  - BoltWriter: live incremental push over Bolt — diffs each compilation unit's
    content_hash, replaces only changed units (idempotent MERGE), prunes
    vanished units on a full run. Uses neo4j-java-driver 4.4.x (JDK 11/native).
  - SchemaCatalog + Schema: the in-repo graph contract (labels, relationships,
    typed properties, DDL); --emit schema serializes it to schema.json.

Tests:
  - Neo4jSchemaConformanceTest (no container): anti-drift guard asserting the
    projector never emits a label/rel/property the catalog doesn't declare, and
    that schema.neo4j.json is current.
  - Neo4jBoltWriterTest (opt-in, Testcontainers Neo4j): full push, idempotent
    re-push, and orphan pruning against a real database. Runs only when
    RUN_CONTAINER_TESTS is set.

Docs/release/packaging:
  - README: install one-liner + Neo4j graph output section + refreshed --help.
  - release.yml: publish codeanalyzer.jar, schema.json and the installer as
    release assets, with cargo-dist-style release notes.
  - packaging/install/codeanalyzer-installer.sh: curl/wget installer that fetches
    the jar and drops a `codeanalyzer` launcher on PATH.
  - neo4j-schema.drawio: diagram of the emitted property-graph schema.
  - schema.neo4j.json: checked-in graph contract. Bump version to 2.4.0.
…e image can analyze (#153)

The GraalVM native image crashed on every project with
`java.lang.NoSuchFieldError: variables` -- JavaParser's metamodel
(PropertyMetaModel.getValue) reflects over AST node fields, which aren't
registered under native-image's closed world, so any parse died before reaching
any emit target. Only `--emit schema` (no analysis) worked.

Fix: register all 266 `com.github.javaparser.ast.**` classes with
allDeclaredFields/Methods/Constructors (per-fixture tracing-agent capture did not
generalize -- unseen apps hit fresh NoSuchFieldErrors like `pairs`/`elements`),
plus the tracing-agent-captured reflect/jni/resource entries for the parse path.

Verified: native `--emit json -a 1` and `--emit neo4j` now succeed on every test
fixture, including the large real apps (daytrader8, plantsbywebsphere) that
previously crashed.

Note: a residual, native-inherent limitation remains -- symbol resolution via
JavaParser's ReflectionTypeSolver is ~20% degraded on JDK/dependency types not
registered for reflection (+ version reports "unknown"). The cldk SDK sidesteps
this by running the jar on a bundled HotSpot JVM (full fidelity), so the native
gap affects only direct use of the standalone native binary.
@rahlk rahlk force-pushed the feature/issue-154-neo4j-and-fix-153 branch from 2e42ac3 to e5c6065 Compare June 22, 2026 23:09
… Bolt seam (#154)

Brings the Java Neo4j backend to parity with the Python/TypeScript siblings and
makes it a lossless projection of the IR.

Namespacing (so a Java graph can share a Neo4j DB with Py*/TS* graphs):
- all node labels J-prefixed, all relationship types J_-prefixed, constraint/index
  names j_-prefixed.
- provenance property renamed _unit -> _module (matches the siblings).
- --emit schema now writes schema.neo4j.json (matches the checked-in contract);
  release asset + README updated.
- NEO4J_URI/USERNAME/PASSWORD/DATABASE env-var fallback (flag > env > default).

Lossless projection (every Lombok entity field is represented):
- new first-class nodes :JInitializationBlock, :JCrudOperation, :JCrudQuery,
  :JComment, with J_HAS_INIT_BLOCK / J_HAS_CRUD_OPERATION / J_HAS_CRUD_QUERY /
  J_HAS_COMMENT and J_HAS_CALLSITE/J_DECLARES_VAR extended to init blocks.
- added scalar props: is_modified, file_path (callable), variable_initializers_json,
  default_value, argument_expr, is_unspecified, start/end_column on params & vars,
  docstrings, and source_kind/destination_kind on J_CALLS.
- CypherWriter.DESCENDANTS extended so the scoped wipe / orphan prune reach the new
  containment edges.

Packaging seam (lets the GraalVM native image prune the driver):
- BoltConfig + BoltSink extracted as driver-free core types; BoltWriter is the only
  class importing org.neo4j.driver.* and is loaded reflectively by Neo4jEmitter, so
  the fat jar bundles the driver (live Bolt push works) while native-image, which
  never statically references it, falls back to writing graph.cypher.

Schema contract regenerated (16 node labels, 20 relationship types, 15 constraints);
conformance test updated; neo4j-schema.drawio refreshed to the J-prefixed schema
with the call graph (J_CALLS / J_RESOLVES_TO) drawn explicitly.
@rahlk rahlk force-pushed the feature/issue-154-neo4j-and-fix-153 branch from e5c6065 to 9fc85c4 Compare June 22, 2026 23:42
@rahlk rahlk self-assigned this Jun 23, 2026
@rahlk rahlk added enhancement New feature or request breaking Breaking Change kind/feature Feature fix Bug fix doc Documentation labels Jun 23, 2026
@rahlk rahlk merged commit f32312a into main Jun 23, 2026
@rahlk rahlk deleted the feature/issue-154-neo4j-and-fix-153 branch June 23, 2026 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking Change doc Documentation enhancement New feature or request fix Bug fix kind/feature Feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Neo4j property-graph output (--emit neo4j) with a lossless, namespaced schema

1 participant