Add expression partitioning enum variant#22207
Conversation
| /// NOTE: Optimizer and execution behavior for this partitioning is intentionally | ||
| /// not implemented and will be introduced incrementally. | ||
| #[derive(Debug, Clone)] | ||
| pub struct ExprPartitioning { |
There was a problem hiding this comment.
Isn't this the same as Range Partitioning https://www.waitingforcode.com/apache-spark-sql/range-partitioning-apache-spark-sql/read#range_partitioning
Wouldn't it be better to use that naming?
There was a problem hiding this comment.
https://dev.mysql.com/doc/refman/8.4/en/partitioning-range.html
https://www.dremio.com/wiki/range-partitioning/
I.e. this is a commonly used term.
There was a problem hiding this comment.
Oh I see the issue already refers to it as range partititioning. Any reason of why not using the terminology here?
There was a problem hiding this comment.
The reason is that we aim to be more flexible here. This can support Range partitioning but also extens beyond that to any physical expr the source wants to provide. I just gave range in the description as one concrete example of how this could be used.
Someone could partition using this scheme on something like city column where:
partition 1 -> city = "New York"
partition 2 -> city = "London"
and so on.
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
| Partitioning::Expr(_) => { | ||
| not_impl_err!( | ||
| "Expression partitioning is not supported by RepartitionExec" | ||
| ) | ||
| } |
There was a problem hiding this comment.
So, it's worth discussing this in more detail I think.
Expr partitioning is much, much more general than Range partitioning.
In Range partitioning, deciding which partition a row maps to involves either a binary search or sorted map lookup. But in Expr partitioning, it will always be a linear scan through the expressions, unless the consumer has reverse-engineered the fact that it is actually Range partitioning under the hood.
So this operator will be much more expensive than it might be otherwise.
What is the reasoning around using expressions here, and not literally ranges?
There was a problem hiding this comment.
My intent wasn't for ExprPartitioning to be efficient execution format for physically repartitioning rows. I was thinking of this as partitioning for sources/plans that already have known partitioning and declare it to preserve in the plan to unlock optimizations.
In follow-ups:
- add explicit compatibility/satisfaction APIs around this metadata we can ask structured questions without doing row-wise linear scans. This would eliminate uneeded repartitions in cases where different partitioning types satisfy one another.
- keep hash repartitioning as the preferred general execution path when DataFusion needs to repartition arbitrary input, unless we later add a more specialized repartitioning strategy.
Let me know thoughts on that 👍
There was a problem hiding this comment.
My intent wasn't for
ExprPartitioningto be efficient execution format for physically repartitioning rows. I was thinking of this as partitioning for sources/plans that already have known partitioning and declare it to preserve in the plan to unlock optimizations.
That works for the first join, but not for followup joins. For example:
If you have a 3 table join, the first join will be able to use an equality match on range partitioning to say: no re-partitioning needed at all because the two tables are partitioned the same way! Great.
But its very likely that the second join does need to re-partition one of its inputs (assuming different join keys between the two joins): the output of join one needs to be re-partitioned to match the third table. Now, technically you can just repartition both sides (i.e. switch to hash or something). But if you instead re-partition to match the third table, then you might be able to significantly cut down on data movement.
So, yes: I think that it is important to be able to efficiently re-partition by this strategy. If we don't have concrete use-cases for generic expression partitioning, then it would not be my first choice here.
Which issue does this PR close?
ExprPartitioningas described in thread: [DISCUSSION] Extending Partitioning to Support More Variants #21992.Rationale for this change
DataFusion currently cannot represent some partitioning schemes truthfully. For example, range-partitioned data currently advertises itself as
Partitioning::Hashonly to avoid repartitioning, which makes later optimizer decisions brittle.This PR introduces expression-based physical partitioning metadata so sources can eventually describe partition membership with predicates. This intentionally leaves optimizer and execution semantics unimplemented for follow-up PRs and to plan the shape of the partitioning API carefully.
What changes are included in this PR?
Partitioning::Expr(ExprPartitioning)to the physical partitioning enum.ExprPartitioning, representing one partition predicate expression per output partition.ExprPartitioningonly when all partition expressions can be remappedUnknownPartitioningnot_impl_err!at call-sites where expression partitioning semantics are not implemented yet.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes, additive only. This adds a public physical partitioning variant and public type:
Partitioning::ExprExprPartitioning