Skip to content

Core

The CLDK class is the top-level entry point. Call the per-language factory, e.g. CLDK.java(project_path=...), to get an analysis object over your project. You never instantiate JavaAnalysis or PythonAnalysis directly; the factory hands you the correct one.

One call per language, always the same shape:

from cldk import CLDK
analysis = CLDK.java(project_path="commons-cli")
# -> JavaAnalysis, ready to query

The per-language factory methods are the top-level API:

  • CLDK.java(...) -> JavaAnalysis
  • CLDK.python(...) -> PythonAnalysis
  • CLDK.typescript(...) -> TypeScriptAnalysis
  • CLDK.c(project_path) -> C analysis

Each is backed by the appropriate static analysis engine, where the symbol table and call graph are produced.

flowchart LR
    C["CLDK"] --> JF["CLDK.java(project_path)"]
    C --> PF["CLDK.python(project_path)"]
    JF --> J[JavaAnalysis]
    PF --> P[PythonAnalysis]
    J --> M[Typed models]
    P --> M

Analysis levels. The depth of analysis is governed by analysis_level. The default, AnalysisLevel.symbol_table, populates classes, methods, and fields. Call-graph computation incurs additional cost: get_call_graph, get_callers, and get_callees require AnalysisLevel.call_graph. Set it up front when call relationships are needed.

ArgumentApplies toWhat it does
project_pathallPath to the project directory to analyze.
analysis_levelallAnalysisLevel.symbol_table (default) or AnalysisLevel.call_graph. The latter is required for call graphs, callers, and callees.
target_filesallRestrict analysis to specific files.
eagerallForce a fresh analysis even when a cached artifact exists.
backendallA Parameter Object selecting the backend. Omit for the default in-process analyzer; pass Neo4jConnectionConfig(...) for a read-only Neo4j backend.

The backend is chosen by the type of the backend= config object:

  • Omit backend= to use the default in-process analyzer. To tune it, pass CodeAnalyzerConfig(...) (Java/TypeScript) or PyCodeAnalyzerConfig(...) (Python, which adds use_codeql and use_ray).
  • Pass Neo4jConnectionConfig(uri=..., username=..., password=..., database=..., application_name=...) for a read-only Neo4j backend.
from cldk.analysis.commons.backend_config import (
CodeAnalyzerConfig,
PyCodeAnalyzerConfig,
Neo4jConnectionConfig,
)

Analysis artifacts are cached under cache_dir (default <project>/.codeanalyzer, with per-language artifacts under <cache_dir>/<language>/). Caching is on by default for Java and TypeScript.

See the generated reference below for the full signature.

The recurring sample project is Apache Commons CLI, unpacked at commons-cli.

from cldk import CLDK
from cldk.analysis import AnalysisLevel
analysis = CLDK.java(
project_path="commons-cli",
analysis_level=AnalysisLevel.call_graph, # needed for call-graph methods
)
print(type(analysis).__name__) # JavaAnalysis
print(len(analysis.get_classes())) # 23

The CodeAnalyzer backend ships with the package; results are cached under <project>/.codeanalyzer and reused on later runs. From here, every method lives on analysis; see the Java API reference for the full surface.

from cldk import CLDK
analysis = CLDK.python(project_path="my_pkg")
print(type(analysis).__name__) # PythonAnalysis
classes = analysis.get_classes() # Dict[str, PyClass]

Same shape, same method names; just call CLDK.python(...) instead. Methods are documented on the Python API reference.

For an introduction, see What is CLDK? and the Quickstart. For task-oriented snippets, see Common tasks and the cocoa; the concepts page explains analysis levels and call graphs in detail, and the cheat sheet provides a one-page summary.

The full generated reference follows.

Source on GitHub cldk 1.2.0

API reference generated from cldk 1.2.0.

Core CLDK module.

This module provides the top-level entry point for the Code Language Development Kit (CLDK), a unified framework for performing static analysis across multiple programming languages. The primary interface is the CLDK class, which serves as a factory for creating language-specific analysis objects, tree-sitter parsers, and sanitization utilities.

The CLDK supports the following languages

  • Java: Full static analysis via CodeAnalyzer backend, including symbol tables, call graphs, and code metrics.
  • Python: Static analysis via codeanalyzer-python backend with optional CodeQL-augmented call graph resolution.
  • C: Basic analysis via libclang for parsing and extracting code structure.

Typical usage involves instantiating CLDK with a target language, then calling analysis to obtain a language-specific analysis facade.

Note This module requires language-specific backends to be available:

  • Java: codeanalyzer-*.jar (auto-downloaded or specified via path)
  • Python: codeanalyzer-python (auto-installed in virtualenv)
  • C: libclang (must be installed on the system)
class CLDK

Core class for the Code Language Development Kit (CLDK).

The CLDK class serves as the primary entry point and factory for all code analysis operations. It provides a unified interface for initializing language-specific analysis facades, tree-sitter parsers, and code sanitization utilities.

This class follows the factory pattern, where the language parameter determines which concrete analysis implementation is returned by the analysis, treesitter_parser, and tree_sitter_utils methods.

Parameters:

NameTypeDescription
languagestrThe target programming language for analysis. Supported values are "java", "python", and "c" (case-sensitive).

Raises:

  • NotImplementedError: Raised by factory methods when the specified language is not yet supported.

See Also

  • JavaAnalysis: Java-specific analysis facade.
  • PythonAnalysis: Python-specific analysis facade.
  • CAnalysis: C-specific analysis facade.
NameTypeDescription
languagestr
java(project_path: str | Path | None = None, source_code: str | None = None, analysis_level: str = AnalysisLevel.symbol_table, target_files: List[str] | None = None, eager: bool = False, backend: JavaBackend | None = None) -> JavaAnalysis

Create a Java analysis facade.

Parameters:

NameTypeDescription
project_pathstr | Path | NonePath to the Java project directory.
source_codestr | NoneSingle Java source string (deprecated; pass project_path instead).
analysis_levelstrAnalysis depth (see AnalysisLevel).
target_filesList[str] | NoneRestrict analysis to these files.
eagerboolForce regeneration of cached analysis.
backendJavaBackend | NoneBackend configuration. Defaults to CodeAnalyzerConfig.

Raises:

  • CldkInitializationException: If neither or both of project_path / source_code are provided.
python(project_path: str | Path | None = None, analysis_level: str = AnalysisLevel.symbol_table, target_files: List[str] | None = None, eager: bool = False, backend: PyBackend | None = None) -> PythonAnalysis

Create a Python analysis facade.

Parameters:

NameTypeDescription
project_pathstr | Path | NonePath to the Python project directory. Optional only when backend is a Neo4jConnectionConfig (the graph is populated out of band).
analysis_levelstrAnalysis depth (see AnalysisLevel).
target_filesList[str] | NoneRestrict analysis to these files.
eagerboolForce regeneration of cached analysis.
backendPyBackend | NoneBackend configuration. Defaults to PyCodeAnalyzerConfig; pass a Neo4jConnectionConfig to use the read-only Neo4j backend.
typescript(project_path: str | Path | None = None, analysis_level: str = AnalysisLevel.symbol_table, target_files: List[str] | None = None, eager: bool = False, backend: TSBackend | None = None) -> TypeScriptAnalysis

Create a TypeScript analysis facade.

Parameters:

NameTypeDescription
project_pathstr | Path | NonePath to the TypeScript project directory. Optional only when backend is a Neo4jConnectionConfig (the graph is populated out of band).
analysis_levelstrAnalysis depth (see AnalysisLevel).
target_filesList[str] | NoneRestrict analysis to these files.
eagerboolForce regeneration of cached analysis.
backendTSBackend | NoneBackend configuration. Defaults to CodeAnalyzerConfig; pass a Neo4jConnectionConfig to use the read-only Neo4j backend.
c(project_path: str | Path) -> CAnalysis

Create a C analysis facade for the given project directory.

analysis(project_path: str | Path | None = None, source_code: str | None = None, eager: bool = False, analysis_level: str = AnalysisLevel.symbol_table, target_files: List[str] | None = None, analysis_backend_path: str | None = None, analysis_json_path: str | Path | None = None, cache_dir: str | Path | None = None, use_codeql: bool = True, use_ray: bool = False, neo4j_config: Neo4jConnectionConfig | None = None) -> JavaAnalysis | PythonAnalysis | CAnalysis | TypeScriptAnalysis

Deprecated entry point. Use the per-language factory methods instead.

CLDK(language).analysis(...) is retained as a thin compatibility shim that forwards to java / python / typescript / c with an appropriate backend= configuration object.

The former analysis_json_path is folded into the unified cache_dir (it is used as the cache root when cache_dir is not given). analysis_backend_path is no longer supported: the backend binary ships with the packaged dependency, and passing it is ignored.

.. deprecated:: Use java, python, typescript, or c with a backend=<config> object.

treesitter_parser() -> TreesitterJava

Return a Tree-sitter parser for the selected language.

Creates and returns a language-specific Tree-sitter parser instance that can be used for syntactic analysis, AST traversal, and code querying operations. Tree-sitter provides incremental parsing with excellent performance characteristics for real-time code analysis.

The returned parser provides methods for

  • Parsing source code into an AST
  • Running Tree-sitter queries to extract code patterns
  • Extracting syntactic elements (methods, classes, imports, etc.)
  • Performing lexical analysis

Returns:

  • TreesitterJava: A Tree-sitter parser wrapper for Java source code. The parser provides methods such as is_parsable, get_raw_ast, get_all_imports, and various code extraction utilities.

Raises:

  • NotImplementedError: If the language specified during CLDK initialization does not have a Tree-sitter parser implementation. Currently, only Java is supported.

Note The Tree-sitter parser operates at the syntactic level only and does not perform semantic analysis. For semantic information like resolved types or call graphs, use analysis instead.

See Also

  • TreesitterJava: Java Tree-sitter parser implementation.
tree_sitter_utils(source_code: str) -> TreesitterSanitizer

Return Tree-sitter-based code sanitization utilities for the selected language.

Creates and returns a utility class that provides code transformation and sanitization operations using Tree-sitter for parsing. These utilities are particularly useful for preparing code for LLM consumption, test generation, and code analysis tasks.

The sanitization utilities provide operations such as

  • Removing unused imports from source code
  • Keeping only focal methods and their callees for context reduction
  • Extracting and manipulating test assertions
  • Identifying and removing dead code

Parameters:

NameTypeDescription
source_codestrThe source code string to initialize the utilities with. This code will be parsed and made available for transformation operations. Must be valid syntax for the target language.

Returns:

  • TreesitterSanitizer: A utility wrapper that provides sanitization and transformation methods for Java source code, including: - keep_only_focal_method_and_its_callees - remove_unused_imports

Raises:

  • NotImplementedError: If the language specified during CLDK initialization does not have sanitization utilities implemented. Currently, only Java is supported.

Note The sanitization utilities modify code at the syntactic level using Tree-sitter patterns. For complex refactoring that requires semantic understanding, consider using the full analysis capabilities via analysis.

See Also

  • TreesitterSanitizer: Java sanitization utility implementation.
  • treesitter_parser: For raw Tree-sitter parsing without sanitization utilities.