Skip to content

fix: Nested self-referential CASE chains should not cause exponential hashing work during physical planning.#22175

Merged
Dandandan merged 3 commits into
apache:mainfrom
coralogix:brent/case-hash-fix
May 14, 2026
Merged

fix: Nested self-referential CASE chains should not cause exponential hashing work during physical planning.#22175
Dandandan merged 3 commits into
apache:mainfrom
coralogix:brent/case-hash-fix

Conversation

@avantgardnerio
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Explained in issue

What changes are included in this PR?

  1. Removed derived Hash & Eq
  2. implemented manual Hash & Eq
  3. manual impl excludes redudant & recursive hash eval_method field

Are these changes tested?

A unit test was added

Are there any user-facing changes?

Pathologically nested CASE/WHEN queries will plan significantly faster.

avantgardnerio and others added 2 commits May 14, 2026 10:17
Nested self-referential CASE chains cause exponential hash work during
physical planning because CaseExpr derives Hash over both the original
CaseBody and the derived ProjectedCaseBody in EvalMethod.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EvalMethod (including ProjectedCaseBody) is deterministically derived
from CaseBody in try_new(), so hashing it is redundant. With the
derived Hash, nested CASE chains caused exponential hash work during
ProjectionExec::compute_properties because each level doubled the
tree traversal. Manual Hash/Eq on body-only fixes this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label May 14, 2026
Comment thread datafusion/physical-expr/src/expressions/case.rs Outdated
@rluvaton rluvaton changed the title Brent/case hash fix fix: Nested self-referential CASE chains should not cause exponential hashing work during physical planning. May 14, 2026
@avantgardnerio
Copy link
Copy Markdown
Contributor Author

@alamb this doesn't look related to our change... do you know who to talk to about CI issues?

Run cargo test --profile ci -p datafusion-ffi --lib --tests --features integration-tests
error: error: unexpected argument 'test' found


// eval_method is functionally derived from body, so excluding it from
// Hash/Eq avoids redundantly hashing the expression tree twice. For
// nested CASE chains this prevents exponential blowup (see #22173).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// nested CASE chains this prevents exponential blowup (see #22173).
// nested CASE chains this prevents exponential blowup (see https://github.com/apache/datafusion/issues/22173).

@Dandandan
Copy link
Copy Markdown
Contributor

@alamb this doesn't look related to our change... do you know who to talk to about CI issues?

Run cargo test --profile ci -p datafusion-ffi --lib --tests --features integration-tests
error: error: unexpected argument 'test' found

Restarted - seems some flaky thing

@avantgardnerio avantgardnerio added this pull request to the merge queue May 14, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 14, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 14, 2026

FYI @pepijnve

I wonder if there is any way to test for this (a benchmark in sql_planner perhaps) so we don't break it in the future

@Dandandan Dandandan added this pull request to the merge queue May 14, 2026
@pepijnve
Copy link
Copy Markdown
Contributor

Dang completely missed that. Would it make sense to use something like derivative to be able to get auto generated Eq and Hash but exclude certain fields?

Merged via the queue into apache:main with commit 41ed40a May 14, 2026
61 of 62 checks passed
@Dandandan Dandandan deleted the brent/case-hash-fix branch May 14, 2026 18:53
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 14, 2026

derivative

I think in general we are trying to keep our (already quite extensive) dependency chain low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Physical planning CPU blowup hashing nested CASE expressions

5 participants