Skip to content

Optimize away unused UNNEST under duplicate-insensitive aggregates#22161

Open
kosiew wants to merge 3 commits into
apache:mainfrom
kosiew:unnested-pruned-02-20118
Open

Optimize away unused UNNEST under duplicate-insensitive aggregates#22161
kosiew wants to merge 3 commits into
apache:mainfrom
kosiew:unnested-pruned-02-20118

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented May 14, 2026

Which issue does this PR close?

Rationale for this change

This change implements a conservative optimization for removing unused UNNEST operators in cases where the parent operator is duplicate-insensitive and the unnested output is not referenced.

Previously, queries such as GROUP BY or DISTINCT over non-unnested columns would still retain UNNEST even when it only introduced duplicate rows and had no effect on the final grouped result. However, removing UNNEST is only safe in narrowly scoped cases because empty or NULL lists can change row cardinality by removing rows entirely.

This PR adds a targeted optimization that only removes UNNEST when it is provably semantics-preserving.

What changes are included in this PR?

  • Extend projection optimization for Aggregate plans to detect removable UNNEST inputs.

  • Add logic to eliminate LogicalPlan::Unnest when:

    • the unnested columns are not referenced by required expressions,
    • the parent aggregate is duplicate-insensitive (GROUP BY with no aggregate expressions, including DISTINCT),
    • the UNNEST input is provably guaranteed to preserve at least one row per input row.
  • Support pruning through an intermediate Projection.

  • Add safety checks to avoid removing UNNEST when:

    • grouped expressions reference unnested columns,
    • the input list may be empty or NULL.
  • Add helper logic for detecting non-empty literal list inputs across supported list scalar types.

Are these changes tested?

Yes.

Added targeted optimizer unit tests covering:

  • removal of unused non-empty literal UNNEST under GROUP BY,
  • removal through an intermediate projection,
  • preservation when unnested columns are referenced,
  • preservation for empty list inputs.

Added sqllogictest coverage for:

  • GROUP BY pruning,
  • DISTINCT pruning,
  • unsafe cases where removing UNNEST would change cardinality.

Are there any user-facing changes?

This change improves logical and physical plan optimization for certain queries involving unused UNNEST expressions under GROUP BY or DISTINCT, but does not introduce user-facing API changes.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 3 commits May 14, 2026 11:45
- Pruned unused Unnest under aggregate/group-by when safe.
- Handled direct Unnest and Projection -> Unnest scenarios.
- Ensured Unnest is retained when empty/null lists may drop rows.
- Added unit tests to cover these changes.
- Updated safe GROUP BY explain in sqllogictest: no Unnest/UnnestExec.
- Added regression tests for empty/null grouped scenarios to ensure Unnest retention.
- Added a unit test for the optimizer to verify that GROUP BY elements keep UNNEST in the case of `keep_referenced_unnest_under_group_by`.
- Implemented SLT DISTINCT regression test for `SELECT DISTINCT id ... UNNEST(make_array(...))`, ensuring the correctness of results and that UNNEST is pruned via the distinct-to-aggregate path.
- Removed Vec allocation for unnested input indices to enhance performance.
- Replaced try_fold boolean flow with a more explicit for loop for better readability.
- Introduced is_unnested_input_index function to clarify input index processing.
- Simplified logic for handling empty lists in .all() calls.
- Added has_valid_first_value helper for validation purposes.
- Introduced new test helpers: id_schema, list_literal_expr, and id_elem_unnest_plan to facilitate testing.
- Streamlined repeated optimizer test setup for efficiency.
- Improved construction of empty-list tests for clarity.
@github-actions github-actions Bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels May 14, 2026
@kosiew kosiew marked this pull request as ready for review May 14, 2026 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant