[SQL] Share indexes between joins when possible#5832
Conversation
|
I will mark this as draft to add a few more tests. |
|
@mihaibudiu will this happen through the graph or just close to input sources? |
|
@mihaibudiu , from the description, this only partially fixes #5815. We'd like index reuse to apply to other operators and to PK indexes as well. These could be separate future PRs, but we shouldn't close #5815 after this PR. Before merging this, we should check that this generates expected DBSP dataflows with shared integrals. |
|
inputs do not appear in this pattern |
0022f68 to
63b2bdd
Compare
|
@ryzhyk does the above profile fragment prove that indexes are shared? |
This fragment only contains one join, so no. |
|
This is the source SQL |
|
This PR won't be the last word even for joins. Today the optimization runs early in the pipeline, but after other optimizations are executed, more opportunities for sharing may surface, which the code here does not support. For example, the inputs could be map_index_flatmap operators. So we will need to revisit this. |
9cacdb9 to
1dbedc4
Compare
|
I think this will need to be rebased on #5752 |
|
@mihaibudiu , how does this handle GC? If one or both joins consuming the stream are able to GC it, the effective waterline should be the greatest lower bound of the two. |
|
This is done before GC. It may inhibit GC, I haven't checked. But none of our tests was affected so far. |
|
Or, more likely, GC may undo this optimization |
We really need a test for this. Undoing the optimization is fine in this case, but we have to make sure it doesn't do unsound stuff like apply one or both of the individual bounds. Likewise, disabling GC altogether in this case would also be a serious issue. |
|
Indeed, when using LATENESS what happens is that first the index is shared, then later it is unshared by inserting a noop, and LATENESS is applied separately after each index operator. |
|
We don't have an optimization to compute the min of two waterlines, that's TBD. But the result is correct. |
|
Without lateness the first graph is essentially the output graph as well, with shared indexes. |
857e63b to
d2e1562
Compare
f201582 to
d4c85c2
Compare
1bd8dc8 to
b839db5
Compare
dd61898 to
94b2a8e
Compare
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>







Fixes #5815
This introduces a new graph-rewriting analysis, looking for the following pattern:
If the two indexes have the same key, this is rewritten to:
where the index computes the union of the values needed by the two joins.
This enables the two integrators associated with the common input of the joins to be shared.
Describe Manual Test Plan
Ran Java tests to completion. Added some custom unit tests.
Will submit a PR to feldera-qa with an additional large-scale test.