branch-4.1: [fix](runtime-filter) Fix race condition in Set operator runtime filter processing #62434#62811
Open
BiteTheDDDDt wants to merge 1 commit intoapache:branch-4.1from
Open
branch-4.1: [fix](runtime-filter) Fix race condition in Set operator runtime filter processing #62434#62811BiteTheDDDDt wants to merge 1 commit intoapache:branch-4.1from
BiteTheDDDDt wants to merge 1 commit intoapache:branch-4.1from
Conversation
…er processing (apache#62434) This pull request addresses a race condition in the Set operator (`INTERSECT`/`EXCEPT`) when using the `IN_FILTER` runtime filter type. The main fix ensures that the hash table size used for runtime filter processing is captured before the probe side can modify the hash table, preventing errors during query execution. Additionally, regression tests are added to verify the fix and prevent future regressions. **Fixes for race condition in Set operator:** * The hash table size is now snapshotted in `SetSinkOperatorX<is_intersect>::sink()` before enabling the probe pipeline, and this saved value is used for runtime filter processing in `SetSinkLocalState<is_intersect>::close()`. This prevents the probe side from shrinking the hash table before the runtime filter is processed, avoiding possible `InternalError` failures. [[1]](diffhunk://#diff-7658815d4a9a8268b4ef4ad00a93998af07a421b538ad608e3f3a0e20ef805cdR103-R107) [[2]](diffhunk://#diff-7658815d4a9a8268b4ef4ad00a93998af07a421b538ad608e3f3a0e20ef805cdL48-R47) [[3]](diffhunk://#diff-1a8671c9fbc97801353bffeffe99e2189a16d275d3f865714e2cc6750ee8c54fR65-R68) * Added a new member variable `_build_hash_table_size` to `SetSinkLocalState` to store the snapshotted hash table size. **Testing and regression coverage:** * Added a new regression test suite `set_operator_in_filter.groovy` and its output file to reproduce and verify the race condition, ensuring the fix is tested for both `INTERSECT` and `EXCEPT` queries with `IN_FILTER` type and small `max_in_num`. [[1]](diffhunk://#diff-2ff8fca2d88b291375196153f6f9377e6a0d80a2b8222d63ef67d2dc307cb332R1-R144) [[2]](diffhunk://#diff-795c4c2d53c199ee4187b5aef698e62a6c993d2aad99014063c0c3540553caf3R1-R23) **Code cleanup:** * Removed unnecessary `#include "common/compile_check_begin.h"` and `#include "common/compile_check_end.h"` lines from several files for better code clarity. [[1]](diffhunk://#diff-7658815d4a9a8268b4ef4ad00a93998af07a421b538ad608e3f3a0e20ef805cdL27) [[2]](diffhunk://#diff-90e895f6474975cbb1d9b2fc276bc6d9e1a22675f5126b836fb83c13fd148eb5L31) [[3]](diffhunk://#diff-90e895f6474975cbb1d9b2fc276bc6d9e1a22675f5126b836fb83c13fd148eb5L57) [[4]](diffhunk://#diff-a8e4192eb0bf59708c56d0a59c46f601684a554628105f70c50211784a1c5271L26) --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> (cherry picked from commit c9e6e79)
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
There was a problem hiding this comment.
Pull request overview
Cherry-picks the fix for a race condition in Set operator (INTERSECT/EXCEPT) runtime filter processing when using IN_FILTER, and adds regression coverage on branch-4.1.
Changes:
- Snapshot the build-side hash table size before enabling the probe pipeline, and use the snapshotted value during runtime filter processing to avoid races.
- Override
finishdependency()forSetSinkLocalStateso the pipeline can properly wait on the finish dependency used bysend_filter_size. - Add a nereids runtime_filter regression test (
set_operator_in_filter) and expected output.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| regression-test/suites/nereids_p0/runtime_filter/set_operator_in_filter.groovy | New regression suite reproducing the prior race scenario with IN_FILTER on INTERSECT/EXCEPT. |
| regression-test/data/nereids_p0/runtime_filter/set_operator_in_filter.out | Expected outputs for the new regression suite queries. |
| be/src/exec/operator/set_sink_operator.h | Exposes finish dependency via finishdependency() and stores a snapshotted build hash table size. |
| be/src/exec/operator/set_sink_operator.cpp | Uses snapshotted hash table size in RF processing and captures it before probe pipeline is enabled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-pick of #62434 to branch-4.1.
Conflict resolution
Only file-location conflicts. branch-4.1 had renamed
regression-test/{data,suites}/query_p0/runtime_filter/toregression-test/{data,suites}/nereids_p0/runtime_filter/. Moved the two new regression files (set_operator_in_filter.groovyandset_operator_in_filter.out) to the renamednereids_p0/runtime_filter/location accordingly.be/src/exec/operator/set_sink_operator.{cpp,h}merged automatically. No code logic from #62434 was changed.Original PR
#62434