Skip to content

[SQL] Share indexes between joins when possible#5832

Merged
mihaibudiu merged 1 commit intomainfrom
issue5815
Mar 19, 2026
Merged

[SQL] Share indexes between joins when possible#5832
mihaibudiu merged 1 commit intomainfrom
issue5815

Conversation

@mihaibudiu
Copy link
Contributor

@mihaibudiu mihaibudiu commented Mar 15, 2026

Fixes #5815

This introduces a new graph-rewriting analysis, looking for the following pattern:

         source
         /    \
     index   index
      /          \
   join         join

If the two indexes have the same key, this is rewritten to:

        source
           |
         index 
       /     \
    join     join

where the index computes the union of the values needed by the two joins.
This enables the two integrators associated with the common input of the joins to be shared.

Describe Manual Test Plan

Ran Java tests to completion. Added some custom unit tests.
Will submit a PR to feldera-qa with an additional large-scale test.

@mihaibudiu
Copy link
Contributor Author

I will mark this as draft to add a few more tests.

@mihaibudiu mihaibudiu marked this pull request as draft March 15, 2026 04:04
@lalithsuresh
Copy link
Contributor

@mihaibudiu will this happen through the graph or just close to input sources?

@ryzhyk
Copy link
Contributor

ryzhyk commented Mar 15, 2026

@mihaibudiu , from the description, this only partially fixes #5815. We'd like index reuse to apply to other operators and to PK indexes as well. These could be separate future PRs, but we shouldn't close #5815 after this PR.

Before merging this, we should check that this generates expected DBSP dataflows with shared integrals.

@mihaibudiu
Copy link
Contributor Author

inputs do not appear in this pattern

@mihaibudiu mihaibudiu force-pushed the issue5815 branch 3 times, most recently from 0022f68 to 63b2bdd Compare March 16, 2026 07:44
@mihaibudiu
Copy link
Contributor Author

image

@mihaibudiu mihaibudiu marked this pull request as ready for review March 16, 2026 18:06
@mihaibudiu
Copy link
Contributor Author

@ryzhyk does the above profile fragment prove that indexes are shared?

@ryzhyk
Copy link
Contributor

ryzhyk commented Mar 16, 2026

@ryzhyk does the above profile fragment prove that indexes are shared?

This fragment only contains one join, so no.

@mihaibudiu
Copy link
Contributor Author

Here is the full graph:
image

@mihaibudiu
Copy link
Contributor Author

This is the source SQL

CREATE LOCAL VIEW V1 AS SELECT
                    o.order_id,
                    o.amount,
                    c.name
                FROM orders1 AS o
                LEFT JOIN customers AS c
                    ON o.customer_id = c.customer_id;
                
                CREATE LOCAL VIEW V2 AS SELECT
                    o.order_id,
                    o.amount,
                    c.first
                FROM orders2 AS o
                LEFT JOIN customers AS c
                    ON o.customer_id = c.customer_id;
                
                CREATE VIEW V AS (SELECT * FROM V1) UNION ALL (SELECT * FROM V2);

@ryzhyk
Copy link
Contributor

ryzhyk commented Mar 16, 2026

Here is the full graph: image

yep, this one shows reuse!

Copy link
Collaborator

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mihaibudiu
Copy link
Contributor Author

This PR won't be the last word even for joins. Today the optimization runs early in the pipeline, but after other optimizations are executed, more opportunities for sharing may surface, which the code here does not support. For example, the inputs could be map_index_flatmap operators. So we will need to revisit this.

@mihaibudiu
Copy link
Contributor Author

I think this will need to be rebased on #5752

@ryzhyk
Copy link
Contributor

ryzhyk commented Mar 17, 2026

@mihaibudiu , how does this handle GC? If one or both joins consuming the stream are able to GC it, the effective waterline should be the greatest lower bound of the two.

@mihaibudiu
Copy link
Contributor Author

This is done before GC. It may inhibit GC, I haven't checked. But none of our tests was affected so far.

@mihaibudiu
Copy link
Contributor Author

Or, more likely, GC may undo this optimization

@ryzhyk
Copy link
Contributor

ryzhyk commented Mar 17, 2026

Or, more likely, GC may undo this optimization

We really need a test for this. Undoing the optimization is fine in this case, but we have to make sure it doesn't do unsound stuff like apply one or both of the individual bounds. Likewise, disabling GC altogether in this case would also be a serious issue.

@mihaibudiu
Copy link
Contributor Author

Indeed, when using LATENESS what happens is that first the index is shared, then later it is unshared by inserting a noop, and LATENESS is applied separately after each index operator.

@mihaibudiu
Copy link
Contributor Author

mihaibudiu commented Mar 17, 2026

We don't have an optimization to compute the min of two waterlines, that's TBD. But the result is correct.

@mihaibudiu
Copy link
Contributor Author

After sharing indexes:

image

Before lateness computation starts: notice the noop nodes inserted

image

After lateness inserts GC operators:

limited

Final graph:

final

@mihaibudiu
Copy link
Contributor Author

Without lateness the first graph is essentially the output graph as well, with shared indexes.

@mihaibudiu mihaibudiu force-pushed the issue5815 branch 2 times, most recently from dd61898 to 94b2a8e Compare March 19, 2026 19:01
@mihaibudiu mihaibudiu added this pull request to the merge queue Mar 19, 2026
@lalithsuresh lalithsuresh added the marketing Relevant for marketing content label Mar 19, 2026
@mihaibudiu mihaibudiu removed this pull request from the merge queue due to a manual request Mar 19, 2026
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
@mihaibudiu mihaibudiu enabled auto-merge March 19, 2026 20:54
@mihaibudiu mihaibudiu added this pull request to the merge queue Mar 19, 2026
Merged via the queue into main with commit a14b015 Mar 19, 2026
1 check passed
@mihaibudiu mihaibudiu deleted the issue5815 branch March 19, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

marketing Relevant for marketing content

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SQL] Share indexes between joins

4 participants