Skip to content

refactor: Update SortMergeJoin to use async spill abstractions and remove open_sync#22230

Draft
pantShrey wants to merge 1 commit into
apache:mainfrom
pantShrey:smj-async-spill
Draft

refactor: Update SortMergeJoin to use async spill abstractions and remove open_sync#22230
pantShrey wants to merge 1 commit into
apache:mainfrom
pantShrey:smj-async-spill

Conversation

@pantShrey
Copy link
Copy Markdown

Note: This PR depends on #21882 (pluggable SpillFile trait) and cannot be merged before it. Opening in parallel per @alamb's suggestion for easier review. The required SpillFile trait used here is defined in that base PR.To review locally, apply #21882 first and then stack this branch on top.

Which issue does this PR close?

Rationale for this change

materializing_stream.rs and bitwise_stream.rs were reading spilled batches via open_sync_reader / direct File::open calls, bypassing the SpillFile abstraction introduced in #21882. This PR migrates both to use SpillManager::read_spill_as_stream, allowing custom backends (Postgres BufFile, object storage) to handle spill reads without requiring an OS file path.

What changes are included in this PR?

  • materializing_stream.rs: Eagerly restores spilled BufferedBatches via async streams before freezing, avoiding new state machine variants.
  • bitwise_stream.rs: Replaces sync reads with an async poll_next_unpin loop, caching the stream to survive Poll::Pending.
  • spill_file.rs: Removes open_sync_reader from the SpillFile trait (no longer needed).

Are these changes tested?

Covered by existing SMJ tests. No new tests added, the behavioral change is internal (sync → async IO path), observable only through custom backends which are not yet in tree.

Are there any user-facing changes?

No. Removes open_sync_reader from the SpillFile trait, this is a breaking API change for anyone implementing the trait, but the trait was introduced in #21882 which has not merged yet so there are no external implementors.

@github-actions github-actions Bot added execution Related to the execution crate physical-plan Changes to the physical-plan crate labels May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

execution Related to the execution crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow pluggable file backends in DiskManager and IPCStreamWriter to support non-OS file systems

1 participant