[BugFix] Fix chained streamstats with window causing NPE (#4800)#5359
Conversation
…ecorrelator (opensearch-project#4800) Replace the correlate-based plan with a self-join plan for the global=true + window + group case in streamstats. Nested correlates (produced when chaining two streamstats with window + group by) caused NPE in Calcite's RelDecorrelator.createValueGenerator during plan preparation. The self-join approach builds a LogicalJoin instead of LogicalCorrelate, which avoids triggering the decorrelation path that fails with nested correlation variable references. Signed-off-by: Heng Qian <qianheng@amazon.com>
Decision LogRoot Cause: The Approach: Replaced the correlate-based plan with a self-join ( Alternatives Rejected:
Pitfalls:
Things to Watch:
|
PR Reviewer Guide 🔍(Review updated until commit 19d74db)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 19d74db Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit a1adf1e
Suggestions up to commit 45cda73
Suggestions up to commit 7c8f991
Suggestions up to commit b5e5995
|
- Update explain_streamstats_global.yaml and explain_streamstats_global_null_bucket.yaml for both pushdown and no-pushdown cases to reflect the new self-join plan (LogicalJoin + LogicalAggregate) instead of the old LogicalCorrelate plan - Guard zero-argument branch in buildSingleAggCall to only allow COUNT() - Use LinkedHashSet for aggInputFields for deterministic field ordering - Add assertions to testMultipleStreamstatsWithWindow unit test Signed-off-by: Heng Qian <qianheng@amazon.com>
|
Persistent review updated to latest commit 7c8f991 |
Merged origin/main which includes version bump to OpenSearch 3.7 (Jackson 2 to 3 parser API change that removes empty _source.excludes arrays from serialization) and the unified SQL language spec PR. Two conflicts in streamstats global explain YAMLs were resolved by: - Keeping this PR's new LogicalJoin-based plan (replaces LogicalCorrelate) - Adopting the new serialization format that omits "excludes":[] Signed-off-by: Heng Qian <qianheng@amazon.com>
|
Persistent review updated to latest commit 45cda73 |
| } | ||
| RexNode argRef = context.relBuilder.field(rightFieldIndex); | ||
|
|
||
| aggCall = |
There was a problem hiding this comment.
can this code aggCall=... reuse the CalciteAggCallVisitor? Or we have to maintain multiple similar logic spots
There was a problem hiding this comment.
Good call — refactored in 19d74db to reuse aggVisitor (CalciteAggCallVisitor). Instead of switching on the PPL function name to build AggCalls, the new code rewrites the window function's field references to the prefixed right-side names (SAL -> r_SAL), unwraps the WindowFunction to its inner Function, and delegates to aggVisitor.analyze(). This drops ~100 lines of duplicated AVG/SUM/MIN/MAX/COUNT/DC/STDDEV_/VAR_/EARLIEST/LATEST mapping and keeps the streamstats self-join path consistent with regular stats/eventstats aggregation handling.
|
Persistent review updated to latest commit a1adf1e |
Address review feedback from @LantaoJin: instead of duplicating the PPL-window-function to Calcite-AggCall mapping inside buildSingleAggCall, rewrite the window function's field references to the prefixed right-side column names and delegate aggregate resolution to the shared aggVisitor. This eliminates ~100 lines of parallel function-name switching (AVG/SUM/MIN/MAX/COUNT/DC/STDDEV_*/VAR_*/ EARLIEST/LATEST) and keeps the streamstats self-join path consistent with regular stats/eventstats aggregation handling. Also extracts the "__r_<name>__" naming convention into named constants (RIGHT_SIDE_FIELD_PREFIX / RIGHT_SIDE_FIELD_SUFFIX / RIGHT_SIDE_SEQ_COLUMN) and a toRightSideFieldName helper so the prefix is defined in one place. Harness: add a reminder under Path B (AST / Function Implementation) in ppl-bugfix-reference.md to reuse aggVisitor / rexVisitor before hand-rolling a new function-name switch. Signed-off-by: Heng Qian <qianheng@amazon.com>
a1adf1e to
19d74db
Compare
|
Persistent review updated to latest commit 19d74db |
Description
Chained
streamstatscommands withwindow+group bycaused NPE in Calcite'sRelDecorrelator.createValueGeneratorduring query plan preparation. This happened because theglobal=true + window + grouppath generatedLogicalCorrelatenodes, and when two such streamstats were chained, the second correlate contained the first inside its right side, creating nested correlates that Calcite's decorrelator couldn't handle.Root Cause: The
buildStreamWindowJoinPlanmethod created aLogicalCorrelatewhere both left and right sides shared the same plan tree. When chaining multiple streamstats, nested correlates triggered NPE inRelDecorrelator.createValueGeneratorat line 1272 (frame.oldToNewOutputs.get(oldCorVarOffset)returned null).Fix: Replaced the correlate-based plan with a self-join (
LogicalJoin) plan for theglobal=true + window + groupcase. The self-join approach:This avoids generating
LogicalCorrelatenodes for this code path entirely.Related Issues
Resolves #4800
Check List
-s)spotlessCheckpassed