felderize attempts to translate Spark SQL schemas and queries into valid Feldera SQL using LLM-based translation with optional compiler validation.
cd python/felderize
python3 -m venv .venv
source .venv/bin/activate
pip install -e .Note:
pip install -e .is required before runningfelderize. It registers the package and CLI command.
Create a .env file:
ANTHROPIC_API_KEY=your-key-here
FELDERA_COMPILER=/path/to/sql-to-dbsp # in Feldera repo: ../../sql-to-dbsp-compiler/SQL-compiler/sql-to-dbsp
FELDERIZE_MODEL=claude-sonnet-4-5The FELDERA_COMPILER path is required for validation. Without it, translation still works but output SQL is not verified. You can also pass it per-command with --compiler PATH.
The compiler must be built before use (requires Java 19–21 and Maven):
cd sql-to-dbsp-compiler
./build.sh# List available examples
felderize example
# Translate an example (validates by default)
felderize example simple
# Without compiler validation
felderize example simple --no-validate
# Log SQL submitted to the validator at each attempt
felderize example json --verbose
# Use a specific compiler binary
felderize example simple --compiler /path/to/sql-to-dbsp
# Output as JSON
felderize example simple --json-outputAvailable examples:
| Name | Description |
|---|---|
simple |
Date truncation, GROUP BY |
strings |
INITCAP, LPAD, NVL, CONCAT_WS |
arrays |
array_contains, size, element_at |
joins |
Null-safe equality (<=>) |
windows |
LAG, running SUM OVER |
aggregations |
COUNT DISTINCT, HAVING (includes unsupported: COLLECT_LIST, PERCENTILE_APPROX) |
json |
get_json_object → PARSE_JSON + VARIANT access (combined file) |
topk |
ROW_NUMBER TopK, QUALIFY, DATEDIFF → TIMESTAMPDIFF (combined file) |
The JSON output contains:
{
"feldera_schema": "...", // translated DDL (CREATE TABLE statements)
"feldera_query": "...", // translated query (CREATE VIEW statements)
"unsupported": [...], // unsupported Spark features found
"warnings": [...], // non-fatal issues
"explanations": [...], // explanations for translation decisions
"status": "success|unsupported|error"
}Two input formats are supported:
Separate schema and query files:
felderize translate path/to/schema.sql path/to/query.sql
felderize translate path/to/schema.sql path/to/query.sql --validateSingle combined file (CREATE TABLE and CREATE VIEW statements in one file):
felderize translate-file path/to/combined.sql
felderize translate-file path/to/combined.sql --validateNote: Running without
--validateprints a warning — the output SQL has not been verified against the Feldera compiler.
Both commands accept:
--validateto validate output against the Feldera compiler (opt-in;examplevalidates by default, use--no-validateto skip)--compiler PATHto specify the path to the Feldera compiler binary (overridesFELDERA_COMPILERenv var)--model MODELto specify the LLM model (overridesFELDERIZE_MODELenv var)--no-docsto disable Feldera SQL reference docs in the prompt--force-docsto include docs on the first pass instead of only as a fallback--verboseto log the SQL submitted to the validator at each repair attempt--json-outputto output results as JSON
Environment variables (set in .env):
| Variable | Description | Default |
|---|---|---|
ANTHROPIC_API_KEY |
Anthropic API key | (required) |
FELDERIZE_MODEL |
LLM model to use (can also be set with --model) |
(required, set in .env) |
FELDERA_COMPILER |
Path to sql-to-dbsp compiler (can also be set with --compiler) |
(required for validation) |
- Loads translation rules from a single skill file (
spark/data/skills/spark_skills.md) - Sends Spark SQL to the LLM with rules, validated examples, and relevant Feldera SQL documentation (from
docs.feldera.com/docs/sql/) - Parses the translated Feldera SQL from the LLM response
- Optionally validates output against the Feldera compiler, retrying with error feedback if needed
Contact us at support@feldera.com for assistance with unsupported Spark SQL features.