Skip to content

feat: eliminate GlobalLimitExec when input statistics prove limit is already satisfied#22150

Open
xiedeyantu wants to merge 3 commits into
apache:mainfrom
xiedeyantu:limit
Open

feat: eliminate GlobalLimitExec when input statistics prove limit is already satisfied#22150
xiedeyantu wants to merge 3 commits into
apache:mainfrom
xiedeyantu:limit

Conversation

@xiedeyantu
Copy link
Copy Markdown
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

GlobalLimitExec (and LocalLimitExec) are sometimes redundant: if the input can be proven via exact statistics to produce no more rows than the fetch value, the limit node does nothing and should be removed entirely.

Previously, the LimitPushdown rule had no mechanism to eliminate such trivially-satisfied limits. A query like SELECT * FROM (VALUES ...) LIMIT 10 — where the input is a single-row PlaceholderRowExec — still carried an unnecessary GlobalLimitExec in the physical plan. Similarly, a LIMIT N over an EmptyExec or any zero-row plan was retained.

What changes are included in this PR?

  • Adds limit_satisfied_by_input() in limit_pushdown.rs: checks whether a plan's child provably produces at most fetch rows (requires skip == 0 and a single output partition).
  • Adds limit_eliminable_exact_num_rows(): iteratively unwraps ProjectionExec wrappers and recognises EmptyExec (0 rows), PlaceholderRowExec (1 row), and any plan reporting Precision::Exact(0) rows as eliminable producers.
  • When a limit is statically satisfied, marks global_state.satisfied = true and returns early — without resetting fetch/skip — so nested limit nodes still receive the correct outer constraints to merge against.
  • Updates the merges_local_limit_with_local_limit snapshot: the result is now bare EmptyExec (limit eliminated).
  • Updates union.slt: ProjectionExec over PlaceholderRowExec (1 row) with fetch=3 no longer carries a redundant GlobalLimitExec.
  • Adds explain_tree.slt test: SELECT count(*) … LIMIT 10 over a two-row VALUES clause is correctly reduced to ProjectionExec → PlaceholderRowExec with no limit node.
  • Updates copy.slt: fetch=10 is now correctly pushed all the way down to DataSourceExec.

Are these changes tested?

Yes.

  • cargo fmt --all
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test -p datafusion-core --test physical_optimizer limit
  • cargo test --features backtrace,parquet_encryption --profile ci --package datafusion-sqllogictest --test sqllogictests -- copy.slt union.slt explain_tree.slt

Are there any user-facing changes?

No API changes. Physical plans for queries with LIMIT over statically small inputs (EmptyExec, PlaceholderRowExec, or zero-row tables) will now have the redundant GlobalLimitExec/LocalLimitExec nodes eliminated, resulting in simpler and slightly more efficient plans.

@github-actions github-actions Bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels May 13, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiedeyantu
Thanks for the cleanup and improvements here. I went through the changes and only had one small suggestion around clarifying the invariant in the helper logic.

Ok(num_rows <= fetch)
}

/// Returns exact row counts only for operators whose cardinality guarantee is
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper comment mentions returning exact row counts for operators whose guarantees are strong enough, but the implementation is intentionally a pretty small whitelist (EmptyExec, PlaceholderRowExec, and exact zero stats). It might help to clarify in the comment that this is meant to be a conservative whitelist.

Another option would be adding a small test or note explaining why a general Precision::Exact(n <= fetch) is not considered safe enough here. I think that would make the invariant a bit clearer for future extensions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the related description. Frankly, I am unsure whether Precision::Exact(n <= fetch) can serve as a reliable basis; specifically, if a FILTER is present, would the row count become imprecise? Therefore, I opted for a conservative approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants