This directory holds the source code for a compiler translating SQL view definitions into DBSP circuits. The SQL compiler is based on the Apache Calcite compiler infrastructure https://calcite.apache.org/
The code has been tested on Windows 10/11 and Linux. It has the following requirements:
-
Java 19, 20, or 21. Other versions will not work.
-
The maven (mvn) Java build program, which will install all required Java dependencies.
The code generated by the compiler is Rust. To run it you need a working installation of Rust (we recommend using rustup to install: https://www.rust-lang.org/tools/install). The compiler uses the current version of the DBSP library sources from Github (https://github.com/feldera/feldera).
If you want to generate images of the query plans you need to also install graphviz as described here https://graphviz.org/download/.
To build the compiler run:
./build.sh
If you get Rust compilation errors you should try to make sure you have the latest version of the Rust libraries and toolchain:
$ rustup update
$ cd temp
$ cargo update
(temp is the directory where the tests write the generated Rust code.)
The DBSP runtime is optimized for performing incremental view maintenance. In consequence, DBSP programs in SQL are expressed as VIEWS, or standing queries. A view is essentially a computation that describes a relation that is computed from other tables or views.
For example, the following query defines a view:
CREATE VIEW V AS SELECT T.COL1 FROM T WHERE T.age > 18In order to interpret this query the compiler needs to have been given a definition of table (or view) T. The table T should be defined using a SQL Data Definition Language (DDL) statement, e.g.:
CREATE TABLE T
(
name VARCHAR,
age INT,
present BOOLEAN
)The compiler must be given the table definition first, and then the view definition. The compiler generates a Rust library which implements the query as a function: given the input data, it produces the output data.
The compiler can (optionally) generate a library which will
incrementally maintain the view V when presented with changes to
table T:
table changes
V
tables -----> SQL-to-DBSP compiler ------> DBSP circuit
views V
view changes
See the documentation in using.md.
A presentation about the internals of the compiler implementation.
Compilation proceeds in several stages:
- SQL statements are parsed using the calcite SQL parser
generating an IR representation using the Calcite
SqlNodedata types - the SQL IR tree is validated, optimized, and converted to the Calcite
RelNoderepresentation - The result of this stage is a
CalciteProgramdata structure, which packages together all the views that are being compiled (multiple views can be maintained simultaneously) CalciteToDBSPCompilerconverts aCalciteProgramdata structure into aDBSPCircuitdata structure.- The
CircuitOptimizerclass can optimize the generated circuit, optionally converting it into an incremental circuit, which is expected to compute only on changes. - The
circuitcan be serialized as Rust using theToRustStringvisitor.
Unit tests are written using JUnit and test pointwise parts of the
compiler. They can be executed using mvn test. They also run as a
side effect of mvn package (omitting -DskipTests).
The unit tests are implemented as individual Java files under
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/.
Each unit test uses the SQL to DBSP compiler to generate Rust code under
sql-to-dbsp-compiler/temp and compiles and runs that code.
If one of the unit tests fails, you may re-populate the Rust code for
it into temp by telling mvn to run only that particular test. If
an error message indicates that
org.dbsp.sqlCompiler.compiler.postgres.PostgresTimestampTests
failed, for example, you may rerun just that test, first with mvn test and then again with cargo test, via:
$ mvn test -Dtest=PostgresTimestampTests
$ cd ../temp
$ cargo test
One of the means of testing the compiler is using sqllogictests: https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki.
The sqllogictests includes more than 5 million tests. It takes weeks to run all of them. Most of the time is spent compiling Rust. We hope to speed that up at some point.
We have implemented a general-purpose testing framework in Java for running SqlLogicTest programs. The framework parses SqlLogicTest files and creates an internal representation of these files. The files are executed by "test executors".
The tests are run by a standalone executable. The executable supports the following command-line arguments:
slt [options] files_or_directories_with_tests
Executes the SQL Logic Tests using a SQL execution engine
See https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki
Options:
-h Show this help message and exit
-x Stop at the first encountered query error
-n Do not execute, just parse the test files
-e executor Executor to use
-b filename Load a list of buggy commands to skip from this file
-v Increase verbosity (can be repeated)
-inc Incremental validation
-u username Postgres user name
-p password Postgres password
Registered executors:
hybrid
dbsp
hsql
psql
none
We have multiple executors. Some executors are inherited from the hydromatic project.
This executor is part of the base hydromatic package; it does not really run any tests. But it can still be used by the test loading mechanism to check that we correctly parse all SQL logic test files.
The model of SQLLogicTest has to be adapted for testing using DBSP.
Since DBSP is not a database, but a streaming system, some SQL
statements are ignored (e.g., CREATE INDEX) and some other
cannot be supported (e.g., CREATE UNIQUE INDEX).
- The DDL statements to create tables are compiled into definitions of circuit inputs.
- The DML INSERT statements are converted into input-generating functions. In the absence of a database we cannot really execute statements that are supposed to fail, so we ignore such statements.
- Some DML statements like DELETE based on a WHERE clause cannot be compiled at all
- The queries are converted into DDL VIEW create statements, which are compiled into circuits.
- The validation rules in SQLLogicTest are compiled into Rust functions that compare the outputs of queries.
So a SqlLogicTest script is turned into multiple DBSP tests, each of which creates a circuit, feeds it one input, reads the output, and validates it, executing exactly one transaction. When testing incremental circuits the test code feeds multiple inputs and only checks the final output.
This executor is a combination of the DBSP executor and the JDBC executor, using a real database to store data in tables, but using DBSP as a query engine. It should be able to execute all SqlLogicTest queries that are supported by the underlying database.