-
Notifications
You must be signed in to change notification settings - Fork 711
feat: support index optimizer for join/subuqery in multi stream #8575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR extends the index optimizer functionality to support multi-stream queries (joins and subqueries) in OpenObserve. The changes transform the index optimizer from handling single-stream queries to processing multiple streams by collecting index fields from all schemas in the SQL query.
The key architectural change is in mod.rs where the single-stream constraint (if sql.stream_names.len() == 1) is removed and replaced with logic that builds a HashMap<TableReference, HashSet<String>> mapping each stream to its index fields. This allows the optimizer to maintain per-table index field information rather than using a global set.
In the physical optimizer module, the LeaderIndexOptimizerRule is updated to work with this new per-table index field structure. A new TableNameVisitor is introduced to extract table names from execution plans by traversing the plan tree and identifying NewEmptyExec nodes, which represent the actual data sources.
The optimization logic now looks up the appropriate index fields for each specific table during query execution, enabling proper index-based optimizations for complex multi-stream scenarios. This change integrates with the existing DataFusion optimizer framework and maintains compatibility with single-stream queries while extending capabilities to joins and subqueries.
Confidence score: 3/5
- This PR has potential issues that need attention before merging safely
- Score reflects concerns about error handling and the robustness of the table name extraction logic
- Pay close attention to the physical optimizer module, particularly the TableNameVisitor implementation
Context used:
Context - Avoid using expect with potentially failing operations; instead, handle the None case to prevent panics. (link)
2 files reviewed, 2 comments
src/service/search/datafusion/optimizer/physical_optimizer/index_optimizer/mod.rs
Show resolved
Hide resolved
src/service/search/datafusion/optimizer/physical_optimizer/index_optimizer/mod.rs
Outdated
Show resolved
Hide resolved
PR Code Suggestions ✨Explore these optional code suggestions:
|
…serve/openobserve into feat-index-optimizer-multi-stream
related to #8173
after this pr, the subquery in below sql can use index optimzier to speed up