Skip to content

[fix](pipeline) guard CountedFinishDependency::sub() against counter underflow#62815

Open
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:fix_counted_finish_dep_underflow
Open

[fix](pipeline) guard CountedFinishDependency::sub() against counter underflow#62815
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:fix_counted_finish_dep_underflow

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

Related PR: N/A

Problem Summary:

CountedFinishDependency::sub() decrements an unsigned uint32_t _counter
without checking that the counter is greater than zero. The contract is that
each sub() is balanced by a prior add(), but if the protocol is ever
violated (e.g. a duplicated sync_filter_size RPC from FE, double callback
on the BE side, or any future bug) the 0-- silently underflows to
UINT32_MAX. The condition if (!_counter) set_ready(); is then never true
and the dependency stays blocked forever, hanging the entire query with no
log line and no crash, making the bug very hard to diagnose.

This is especially relevant because there are multiple callers of sub():

  • SyncSizeCallback::call() on RPC failure
  • SyncSizeCallback::call() on RPC success but error response status
  • RuntimeFilterProducer::set_synced_size() triggered by FE

Any imbalance between these and the single add() in
RuntimeFilterProducer::set_dependency() would produce the silent hang.

Add an explicit check at the top of sub() that throws an
Exception(ErrorCode::INTERNAL_ERROR, ...) carrying debug_string() when
_counter == 0. This matches the project convention of asserting
correctness and failing fast rather than letting the process limp along in
a corrupt state.

Release note

Fix a potential silent query hang caused by an unbalanced
CountedFinishDependency::sub() call underflowing the counter. Such cases
now throw an explicit INTERNAL_ERROR instead.

Check List (For Author)

  • Test:

    • Unit Test: added CountedFinishDependencyTest in
      be/test/exec/runtime_filter/sync_size_callback_test.cpp covering
      sub() on a fresh counter, sub() after balanced add/sub, and the happy
      path of multiple add() / sub() reaching ready. All pass.
    • Existing SyncSizeCallbackTest.* re-run, no regression.
  • Behavior changed: No (the only observable change is converting an existing
    silent failure into a loud error when the contract is violated)

  • Does this need documentation: No

…underflow

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: N/A

Problem Summary:

`CountedFinishDependency::sub()` decrements an unsigned `uint32_t _counter`
without checking that the counter is greater than zero. The contract is that
each `sub()` is balanced by a prior `add()`, but if the protocol is ever
violated (e.g. a duplicated `sync_filter_size` RPC from FE, double callback
on the BE side, or any future bug) the `0--` silently underflows to
`UINT32_MAX`. The condition `if (!_counter) set_ready();` is then never true
and the dependency stays blocked forever, hanging the entire query with no
log line and no crash, making the bug very hard to diagnose.

This is especially relevant because there are multiple callers of `sub()`:
- `SyncSizeCallback::call()` on RPC failure
- `SyncSizeCallback::call()` on RPC success but error response status
- `RuntimeFilterProducer::set_synced_size()` triggered by FE

Any imbalance between these and the single `add()` in
`RuntimeFilterProducer::set_dependency()` would produce the silent hang.

Add an explicit check at the top of `sub()` that throws an
`Exception(ErrorCode::INTERNAL_ERROR, ...)` carrying `debug_string()` when
`_counter == 0`. This matches the project convention of asserting
correctness and failing fast rather than letting the process limp along in
a corrupt state.

### Release note

Fix a potential silent query hang caused by an unbalanced
`CountedFinishDependency::sub()` call underflowing the counter. Such cases
now throw an explicit `INTERNAL_ERROR` instead.

### Check List (For Author)

- Test:
    - [x] Unit Test: added `CountedFinishDependencyTest` in
      `be/test/exec/runtime_filter/sync_size_callback_test.cpp` covering
      sub() on a fresh counter, sub() after balanced add/sub, and the happy
      path of multiple add() / sub() reaching ready. All pass.
    - [x] Existing `SyncSizeCallbackTest.*` re-run, no regression.

- Behavior changed: No (the only observable change is converting an existing
  silent failure into a loud error when the contract is violated)

- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 24, 2026 09:07
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the pipeline runtime-filter dependency mechanism by preventing CountedFinishDependency::sub() from silently underflowing its unsigned counter (which can otherwise leave the dependency permanently blocked and hang a query).

Changes:

  • Add an explicit underflow guard in CountedFinishDependency::sub() that throws an INTERNAL_ERROR when _counter == 0.
  • Add unit tests covering the new underflow behavior and verifying the normal balanced add/sub readiness behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
be/src/exec/pipeline/dependency.h Adds an underflow check in CountedFinishDependency::sub() to avoid silent hangs.
be/test/exec/runtime_filter/sync_size_callback_test.cpp Adds gtest cases validating underflow throws and balanced counter behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 195 to +203
void sub() {
std::unique_lock<std::mutex> l(_mtx);
// _counter is unsigned: a stray sub() when counter is already 0 would
// underflow to UINT32_MAX and the dependency would never become ready,
// hanging the query forever. Fail loudly instead.
if (_counter == 0) [[unlikely]] {
throw Exception(ErrorCode::INTERNAL_ERROR,
"CountedFinishDependency::sub() underflow on {}", debug_string());
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CountedFinishDependency::sub() now throws doris::Exception / uses ErrorCode::INTERNAL_ERROR in an inline method, but dependency.h doesn’t include the header that defines them. Please add the appropriate include (e.g. common/exception.h) here so this header compiles standalone and doesn’t rely on transitive includes.

Copilot uses AI. Check for mistakes.
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24882151710

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants