Skip to content

CRUD detection

codeanalyzer-java surfaces data-access patterns by detecting CRUD operations in method bodies and attaching them to the relevant callable. This lets you audit where an application reads from and writes to persistent storage without reading every method by hand.

Detection is dispatched per framework by a CRUDFinderFactory. Today JPA / Jakarta Persistence detection is fully implemented; Spring Data and JDBC finders exist but are currently stubs.

Each callable carries two arrays:

{
crud_operations: JCRUDOperation[] // persistence operations (persist/find/merge/remove, ...)
crud_queries: JCRUDQuery[] // query definitions (createQuery / createNamedQuery)
}
{
line_number: number
operation_type: "CREATE" | "READ" | "UPDATE" | "DELETE"
target_table: string // reserved — not yet populated
involved_columns: string[] // reserved — not yet populated
condition: string // reserved — not yet populated
joined_tables: string[] // reserved — not yet populated
}

For JPA, calls on the EntityManager and on query objects map to operation types:

Operation typeDetected from
CREATEEntityManager.persist(...)
READEntityManager.find(...); query execution getResultList(), getSingleResult(), getFirstResult(), getMaxResults()
UPDATEEntityManager.merge(...); query executeUpdate()
DELETEEntityManager.remove(...)

Query definitions (as opposed to executions) are captured separately:

{
line_number: number
query_arguments: string[] // the query string and any parameters
query_type: "READ" | "WRITE" | "NAMED"
}

These come from EntityManager.createQuery(String) and EntityManager.createNamedQuery(String). The query_type is inferred from the query text:

query_typeInferred when
READthe query string begins with select
WRITEthe query string begins with update, delete, or insert
NAMEDthe query was created via createNamedQuery(...)
  • JPA / Jakarta Persistence — implemented as described above.
  • Spring Data — finder present but stubbed; repository-derived queries are not yet classified.
  • JDBC — finder present but stubbed; raw Statement / PreparedStatement calls are not yet classified.
  • The target_table, involved_columns, condition, and joined_tables fields on JCRUDOperation are reserved for future enrichment and are not populated yet.

When you project the analysis with --emit neo4j, CRUD detection is not flattened into the callable — it becomes first-class graph structure. Each detected operation is a :JCrudOperation node and each query a :JCrudQuery node, hung off its owning :JCallable or :JCallSite:

(:JCallable | :JCallSite)-[:J_HAS_CRUD_OPERATION]->(:JCrudOperation)
(:JCallable | :JCallSite)-[:J_HAS_CRUD_QUERY]->(:JCrudQuery)

That turns “where does this application write to persistent storage?” into a Cypher traversal across the whole graph — and, once many applications share one database, across the entire portfolio. For example, every method that issues a write:

MATCH (c:JCallable)-[:J_HAS_CRUD_OPERATION]->(op:JCrudOperation)
WHERE op.operation_type IN ['CREATE', 'UPDATE', 'DELETE']
RETURN c.signature, op.operation_type

JCrudOperation exposes operation_type along with the target_table, involved_columns, condition, and joined_tables properties — keep in mind those last four are reserved (see above), so today you filter on operation_type. See the Neo4j graph-schema reference for the full node and relationship inventory.

from cldk import CLDK
from cldk.analysis import AnalysisLevel
analysis = CLDK.java(
project_path="my-app",
analysis_level=AnalysisLevel.symbol_table,
)
for cls in analysis.get_classes():
for sig, m in analysis.get_methods_in_class(cls).items():
for op in m.crud_operations:
print(f"{op.operation_type} at {cls}:{op.line_number}")

The same query works unchanged against a graph that was produced out of band: pass a Neo4jConnectionConfig instead of project_path and the read-only Neo4j backend reconstructs the identical models — no JDK, native binary, or project source required. Set application_name to the --app-name the graph was loaded with. See the Neo4j graph output guide for the full read-back flow.

Combine with entry-point and call-graph data to answer questions like “which externally-reachable methods perform writes?”