-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add expression partitioning enum variant #22207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -600,6 +600,11 @@ impl BatchPartitioner { | |
| num_input_partitions, | ||
| )) | ||
| } | ||
| Partitioning::Expr(_) => { | ||
| not_impl_err!( | ||
| "Expression partitioning is not supported by RepartitionExec" | ||
| ) | ||
| } | ||
|
Comment on lines
+603
to
+607
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, it's worth discussing this in more detail I think.
In So this operator will be much more expensive than it might be otherwise. What is the reasoning around using expressions here, and not literally ranges?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My intent wasn't for In follow-ups:
Let me know thoughts on that 👍
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That works for the first join, but not for followup joins. For example: If you have a 3 table join, the first join will be able to use an equality match on range partitioning to say: no re-partitioning needed at all because the two tables are partitioned the same way! Great. But its very likely that the second join does need to re-partition one of its inputs (assuming different join keys between the two joins): the output of join one needs to be re-partitioned to match the third table. Now, technically you can just repartition both sides (i.e. switch to hash or something). But if you instead re-partition to match the third table, then you might be able to significantly cut down on data movement. So, yes: I think that it is important to be able to efficiently re-partition by this strategy. If we don't have concrete use-cases for generic expression partitioning, then it would not be my first choice here. |
||
| other => { | ||
| not_impl_err!("Unsupported repartitioning scheme {other:?}") | ||
| } | ||
|
|
@@ -1260,6 +1265,11 @@ impl ExecutionPlan for RepartitionExec { | |
| } | ||
| Partitioning::Hash(new_partitions, *size) | ||
| } | ||
| Partitioning::Expr(_) => { | ||
| return not_impl_err!( | ||
| "Expression partitioning is not supported for projection pushdown through RepartitionExec" | ||
| ); | ||
| } | ||
| others => others.clone(), | ||
| }; | ||
|
|
||
|
|
@@ -1296,6 +1306,11 @@ impl ExecutionPlan for RepartitionExec { | |
| if !self.maintains_input_order()[0] { | ||
| return Ok(SortOrderPushdownResult::Unsupported); | ||
| } | ||
| if matches!(self.partitioning(), Partitioning::Expr(_)) { | ||
| return not_impl_err!( | ||
| "Expression partitioning is not supported for sort pushdown through RepartitionExec" | ||
| ); | ||
| } | ||
|
|
||
| // Delegate to the child and wrap with a new RepartitionExec | ||
| self.input.try_pushdown_sort(order)?.try_map(|new_input| { | ||
|
|
@@ -1319,6 +1334,11 @@ impl ExecutionPlan for RepartitionExec { | |
| RoundRobinBatch(_) => RoundRobinBatch(target_partitions), | ||
| Hash(hash, _) => Hash(hash, target_partitions), | ||
| UnknownPartitioning(_) => UnknownPartitioning(target_partitions), | ||
| Expr(_) => { | ||
| return not_impl_err!( | ||
| "Expression partitioning is not supported for changing RepartitionExec partition counts" | ||
| ); | ||
| } | ||
| }; | ||
| Ok(Some(Arc::new(Self { | ||
| input: Arc::clone(&self.input), | ||
|
|
@@ -1447,6 +1467,11 @@ impl RepartitionExec { | |
| num_input_partitions, | ||
| ) | ||
| } | ||
| Partitioning::Expr(_) => { | ||
| return not_impl_err!( | ||
| "Expression partitioning is not supported by RepartitionExec" | ||
| ); | ||
| } | ||
| other => { | ||
| return not_impl_err!("Unsupported repartitioning scheme {other:?}"); | ||
| } | ||
|
|
@@ -1863,6 +1888,7 @@ mod tests { | |
| use datafusion_common_runtime::JoinSet; | ||
| use datafusion_execution::config::SessionConfig; | ||
| use datafusion_execution::runtime_env::RuntimeEnvBuilder; | ||
| use datafusion_physical_expr::ExprPartitioning; | ||
| use insta::assert_snapshot; | ||
|
|
||
| #[test] | ||
|
|
@@ -2155,6 +2181,34 @@ mod tests { | |
| ); | ||
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn unsupported_expr_partitioning() -> Result<()> { | ||
| let task_ctx = Arc::new(TaskContext::default()); | ||
| let batch = RecordBatch::try_from_iter(vec![( | ||
| "my_awesome_field", | ||
| Arc::new(StringArray::from(vec!["foo", "bar"])) as ArrayRef, | ||
| )])?; | ||
|
|
||
| let schema = batch.schema(); | ||
| let expr = col("my_awesome_field", &schema)?; | ||
| let input = MockExec::new(vec![Ok(batch)], Arc::clone(&schema)); | ||
| let partitioning = Partitioning::Expr(ExprPartitioning::new(vec![expr])); | ||
| let exec = RepartitionExec::try_new(Arc::new(input), partitioning)?; | ||
| let output_stream = exec.execute(0, task_ctx)?; | ||
|
|
||
| let result_string = crate::common::collect(output_stream) | ||
| .await | ||
| .unwrap_err() | ||
| .to_string(); | ||
| assert!( | ||
| result_string | ||
| .contains("Expression partitioning is not supported by RepartitionExec"), | ||
| "actual: {result_string}" | ||
| ); | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn error_for_input_exec() { | ||
| // This generates an error on a call to execute. The error | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same as Range Partitioning https://www.waitingforcode.com/apache-spark-sql/range-partitioning-apache-spark-sql/read#range_partitioning
Wouldn't it be better to use that naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://dev.mysql.com/doc/refman/8.4/en/partitioning-range.html
https://www.dremio.com/wiki/range-partitioning/
I.e. this is a commonly used term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see the issue already refers to it as range partititioning. Any reason of why not using the terminology here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that we aim to be more flexible here. This can support Range partitioning but also extens beyond that to any physical expr the source wants to provide. I just gave range in the description as one concrete example of how this could be used.
Someone could partition using this scheme on something like city column where:
and so on.