Skip to content

Conversation

@haohuaijin
Copy link
Collaborator

@haohuaijin haohuaijin commented Sep 24, 2025

close #8173

  • check if fst field in the stream when use match_all
match_all will not support the below case
   1. join
   1. from clause is a subquery/cte and outside has match_all, select * from (select
      kubernetes_namespace_name, count(*) from t1 group by kubernetes_namespace_name order by
      count(*)) where match_all('error')
match_all support in subquery like below, for example:
   1. select * from (select * from t1 where match_all('error'))
   2. select * from t1 where id in (select id from t2 where match_all('error')) and
      match_all('critical')

@github-actions
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid false rejection without match_all

This currently rejects queries even when they do not use match_all(). Gate the error
behind has_match_all so normal multi-stream queries continue. This prevents false
negatives where is_support_match_all is false but match_all() is absent.

src/service/search/sql/mod.rs [186-190]

-if !match_visitor.is_support_match_all {
+if match_visitor.has_match_all && !match_visitor.is_support_match_all {
     return Err(Error::ErrorCode(ErrorCodes::SearchSQLNotValid(
         "Using match_all() function in a subquery/join is not supported".to_string(),
     )));
 }
Suggestion importance[1-10]: 8

__

Why: Correctly gates the error on has_match_all, preventing rejecting valid queries lacking match_all() when is_support_match_all is false; this fixes a functional regression with high impact and aligns with the visitor’s intent.

Medium
General
Robust function name matching

Comparing function names via to_string().to_lowercase() can misidentify qualified
names or quoted identifiers. Normalize by checking the last identifier and doing a
case-insensitive compare to avoid false negatives/positives.

src/service/search/sql/visitor/match_all.rs [93-109]

 pub fn has_match_all(expr: &Expr) -> bool {
-    let mut has_match_all = false;
+    let mut has_match_all_flag = false;
     let _ = visit_expressions(expr, |expr| {
         if let Expr::Function(func) = expr {
-            let name = func.name.to_string().to_lowercase();
-            if (name == MATCH_ALL_UDF_NAME || name == FUZZY_MATCH_ALL_UDF_NAME)
-                && let FunctionArguments::List(list) = &func.args
-                && !list.args.is_empty()
-            {
-                has_match_all = true;
-                return ControlFlow::Break(());
+            if let Some(ident) = func.name.0.last() {
+                let name_lc = ident.value.to_lowercase();
+                if (name_lc == MATCH_ALL_UDF_NAME || name_lc == FUZZY_MATCH_ALL_UDF_NAME)
+                    && matches!(&func.args, FunctionArguments::List(list) if !list.args.is_empty())
+                {
+                    has_match_all_flag = true;
+                    return ControlFlow::Break(());
+                }
             }
         }
         ControlFlow::Continue(())
     });
-    has_match_all
+    has_match_all_flag
 }
Suggestion importance[1-10]: 7

__

Why: Using the last identifier avoids issues with qualified or quoted function names and improves correctness of has_match_all; it’s a targeted, accurate enhancement with meaningful reliability gains.

Medium
Detect match_all in all clauses

This only checks select.selection and misses match_all() inside HAVING clauses, ON
join predicates, or nested boolean expressions. Expand detection to scan WHERE,
HAVING, and JOIN ON expressions using visit_expressions to avoid unsupported cases
slipping through.

src/service/search/sql/visitor/match_all.rs [61-88]

 if let sqlparser::ast::SetExpr::Select(select) = query.body.as_ref() {
-    // if from clause has more than one table, where clause should not have match_all
-    if select.from.len() > 1 && select.selection.as_ref().is_some_and(has_match_all) {
+    let where_has = select.selection.as_ref().is_some_and(has_match_all);
+    let having_has = select.having.as_ref().is_some_and(has_match_all);
+    let joins_have = select
+        .from
+        .iter()
+        .flat_map(|from| from.joins.iter())
+        .any(|j| j.condition.on().is_some_and(has_match_all));
+
+    let any_clause_has = where_has || having_has || joins_have;
+
+    // more than one table in FROM
+    if select.from.len() > 1 && any_clause_has {
         self.is_support_match_all = false;
     }
-
-    // if from clause has join, where clause should not have match_all
-    if select.from.iter().any(|from| !from.joins.is_empty())
-        && select.selection.as_ref().is_some_and(has_match_all)
-    {
+    // explicit joins
+    if select.from.iter().any(|from| !from.joins.is_empty()) && any_clause_has {
         self.is_support_match_all = false;
     }
-
-    // if from clause has a subquery, where clause should not have match_all
+    // outer WHERE/HAVING/ON with derived table in FROM
     if select
         .from
         .iter()
         .any(|from| matches!(from.relation, TableFactor::Derived { .. }))
-        && select.selection.as_ref().is_some_and(has_match_all)
+        && any_clause_has
     {
         self.is_support_match_all = false;
     }
-
-    // if query has CTE (Common Table Expression), where clause should not have match_all
-    if query.with.is_some() && select.selection.as_ref().is_some_and(has_match_all) {
+    // CTE with outer WHERE/HAVING/ON
+    if query.with.is_some() && any_clause_has {
         self.is_support_match_all = false;
     }
 }
Suggestion importance[1-10]: 6

__

Why: Expanding detection to HAVING and JOIN ON makes the restriction more complete and reduces edge cases; however, it proposes broader logic than currently present and may need API checks (e.g., join condition access), so impact is moderate.

Low

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR enhances the match_all functionality to support multi-stream queries by refactoring the validation logic. The key changes include:

  • Simplified validation in mod.rs: Removed complex full-text search field validation and stream setting checks, delegating multi-stream detection to the visitor pattern
  • Enhanced MatchVisitor logic: Completely rewrote the visitor to precisely identify unsupported scenarios where match_all is used with joins, multi-table FROM clauses, derived tables, or CTEs
  • Improved semantic clarity: Renamed is_multi_stream to is_support_match_all with inverted boolean logic for better readability
  • Comprehensive test coverage: Added extensive tests covering various SQL patterns including joins, subqueries, CTEs, and multi-table scenarios

The implementation now correctly allows match_all within subqueries while preventing its use in complex multi-stream scenarios that would be difficult to optimize.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk, containing well-tested logic improvements
  • Score reflects solid implementation with comprehensive test coverage and clear logic, with only a minor typo found
  • No files require special attention - the implementation is well-structured and thoroughly tested

Important Files Changed

File Analysis

Filename        Score        Overview
src/service/search/sql/mod.rs 4/5 Simplified match_all validation logic by removing full-text search field checks and relying on visitor pattern for multi-stream detection
src/service/search/sql/visitor/match_all.rs 5/5 Enhanced match_all visitor with comprehensive multi-stream detection logic and extensive test coverage for various SQL scenarios

Sequence Diagram

sequenceDiagram
    participant Client
    participant SQL_Parser as SQL Parser
    participant MatchVisitor as Match Visitor  
    participant MatchAllDetector as Match All Detector
    participant ValidationEngine as Validation Engine

    Client->>SQL_Parser: Parse SQL Query with match_all()
    SQL_Parser->>MatchVisitor: Create MatchVisitor instance
    MatchVisitor->>MatchVisitor: Initialize is_support_match_all = true
    
    SQL_Parser->>MatchVisitor: Visit AST expressions
    MatchVisitor->>MatchAllDetector: Check if expression contains match_all()
    MatchAllDetector-->>MatchVisitor: Return has_match_all = true/false
    
    SQL_Parser->>MatchVisitor: Visit query structure
    MatchVisitor->>MatchVisitor: Check for joins (FROM table1 JOIN table2)
    MatchVisitor->>MatchVisitor: Check for multi-table FROM (FROM table1, table2)
    MatchVisitor->>MatchVisitor: Check for derived tables (FROM (subquery))
    MatchVisitor->>MatchVisitor: Check for CTEs (WITH clause)
    
    alt Multi-stream query with match_all in WHERE clause
        MatchVisitor->>MatchVisitor: Set is_support_match_all = false
    else Single stream or match_all in subquery
        MatchVisitor->>MatchVisitor: Keep is_support_match_all = true
    end
    
    MatchVisitor-->>SQL_Parser: Return validation results
    
    SQL_Parser->>ValidationEngine: Check is_support_match_all
    alt is_support_match_all == false
        ValidationEngine-->>Client: Error: match_all() in subquery/join not supported
    else is_support_match_all == true
        ValidationEngine-->>Client: Query validation successful
    end
Loading

2 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

@haohuaijin haohuaijin merged commit 1a8e749 into main Sep 25, 2025
29 checks passed
@haohuaijin haohuaijin deleted the feat-multi-stream-match-all branch September 25, 2025 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support tantivy index in multi stream

3 participants