Skip to content

Expose ExecutionPlan statistics across the FFI boundary#22157

Open
mailmindlin wants to merge 9 commits into
apache:mainfrom
mailmindlin:feature/ffi-statistics
Open

Expose ExecutionPlan statistics across the FFI boundary#22157
mailmindlin wants to merge 9 commits into
apache:mainfrom
mailmindlin:feature/ffi-statistics

Conversation

@mailmindlin
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

ExecutionPlan::partition_statistics and TableProvider::statistics are not currently transported across the DataFusion FFI boundary, so foreign plans and providers always report Statistics::new_unknown / None. This blocks optimizer rules that depend on statistics (e.g. join reordering, partition pruning) from working with out-of-process plugins, which defeats the point of exposing those hooks to plugin authors.

Statistics contains Precision<ScalarValue> for column min/max/sum. ScalarValue is a large enum that's impractical to mirror in #[repr(C)], so I reuse the existing datafusion_proto_common::Statistics prost encoding — the same pattern this crate already uses for filter expressions.

What changes are included in this PR?

  • New datafusion_ffi::statistics module with [de]serialize_statistics helpers wrapping thedatafusion_proto_common::Statistics round-trip.
  • New partition_statistics field on FFI_ExecutionPlan and corresponding ExecutionPlan::partition_statistics impl on ForeignExecutionPlan
  • New statistics field on FFI_TableProvider and corresponding TableProvider::statistics impl on ForeignTableProvider. Since the trait returns Option<Statistics>, the implementation cannot propagate decode errors, it logs a log::warn! and triggers a debug_assert!.

This PR is expected to be merged after #22136 so it includes those changes.

Are these changes tested?

Yes:

  • Unit tests in statistics.rs cover three round-trip cases: Statistics::new_unknown, fully-exact statistics with ScalarValue::Int32/Int64/Utf8 min/max/sum, and mixed Precision::Exact/Inexact/Absent values.
  • A new round-trip integration test in execution_plan.rs exercises ForeignExecutionPlan::partition_statistics with both None and Some(idx) partitions, against a plan with no statistics (returns Statistics::new_unknown) and a plan with concrete statistics.
  • A new round-trip integration test in table_provider.rs uses a thin TableWithStats wrapper over MemTable to verify both the None path and the concrete Statistics path through ForeignTableProvider::statistics.

Are there any user-facing changes?

This is a breaking ABI change for the datafusion-ffi crate:

  • FFI_ExecutionPlan gains a partition_statistics field.
  • FFI_TableProvider gains a statistics field.

Plugins compiled against earlier versions of datafusion-ffi will need to be recompiled. There are no breaking changes to the Rust trait surface or to Statistics itself; downstream ExecutionPlan / TableProvider implementations require no changes.

@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates ffi Changes to the ffi crate labels May 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-ffi v53.1.0 (current)
       Built [  58.694s] (current)
     Parsing datafusion-ffi v53.1.0 (current)
      Parsed [   0.058s] (current)
    Building datafusion-ffi v53.1.0 (baseline)
       Built [  58.172s] (baseline)
     Parsing datafusion-ffi v53.1.0 (baseline)
      Parsed [   0.058s] (baseline)
    Checking datafusion-ffi v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.228s] 222 checks: 220 pass, 1 fail, 1 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field FFI_ExecutionPlan.partition_statistics in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:82

--- warning repr_c_plain_struct_fields_reordered: struct fields reordered in repr(C) struct ---

Description:
A public repr(C) struct had its fields reordered. This can change the struct's memory layout, possibly breaking FFI use cases that depend on field position and order.
        ref: https://doc.rust-lang.org/reference/type-layout.html#reprc-structs
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/repr_c_plain_struct_fields_reordered.ron

Failed in:
  FFI_ExecutionPlan.clone moved from position 8 to 9, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:89
  FFI_ExecutionPlan.release moved from position 9 to 10, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:92
  FFI_ExecutionPlan.private_data moved from position 10 to 11, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:96
  FFI_ExecutionPlan.library_marker_id moved from position 11 to 12, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/execution_plan.rs:101
  FFI_TableProvider.logical_codec moved from position 6 to 7, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/table_provider.rs:145
  FFI_TableProvider.version moved from position 9 to 10, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/table_provider.rs:155
  FFI_TableProvider.library_marker_id moved from position 11 to 12, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/table_provider.rs:164

     Summary semver requires new major version: 1 major and 0 minor checks failed
     Warning produced 1 major and 0 minor level warnings
    Finished [ 120.349s] datafusion-ffi

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 13, 2026
@mailmindlin mailmindlin force-pushed the feature/ffi-statistics branch from 97fb89e to a171b9e Compare May 14, 2026 13:40
@github-actions github-actions Bot removed the physical-expr Changes to the physical-expr crates label May 14, 2026
@timsaucer timsaucer requested review from Copilot and timsaucer May 14, 2026 19:30
@timsaucer timsaucer added the api change Changes the API exposed to users of the crate label May 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR plumbs ExecutionPlan::partition_statistics and TableProvider::statistics across the datafusion-ffi boundary so out-of-process plugins can report real statistics to the planner instead of always returning Statistics::new_unknown/None. It reuses the existing datafusion_proto_common::Statistics prost encoding (the same approach already used for filter expressions) rather than mirroring ScalarValue in #[repr(C)].

Changes:

  • New datafusion_ffi::statistics module with serialize_statistics/deserialize_statistics helpers backed by datafusion-proto-common.
  • New partition_statistics function pointer on FFI_ExecutionPlan and corresponding ForeignExecutionPlan::partition_statistics implementation (Result-propagating).
  • New statistics function pointer on FFI_TableProvider and corresponding ForeignTableProvider::statistics implementation (logs + debug_assert! on decode error since the trait can only return Option).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
datafusion/ffi/src/lib.rs Registers the new statistics module.
datafusion/ffi/src/statistics.rs New helpers wrapping prost round-trip with three unit round-trip tests.
datafusion/ffi/src/execution_plan.rs Adds partition_statistics FFI field, wrapper, and ForeignExecutionPlan impl; extends EmptyExec test helper with with_statistics; adds a round-trip integration test.
datafusion/ffi/src/table_provider.rs Adds statistics FFI field, wrapper, and ForeignTableProvider::statistics impl with log::warn!/debug_assert! on decode failure; adds a TableWithStats-based round-trip test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but a few things worth addressing

/// `datafusion_proto_common::Statistics`.
pub partition_statistics: unsafe extern "C" fn(
plan: &Self,
partition: FFI_Option<u64>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why FFI_Option instead of FFI_Option?

Copy link
Copy Markdown
Contributor Author

@mailmindlin mailmindlin May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you mean

Comment on lines +493 to +496
#[cfg(test)]
use datafusion_common::stats::Precision;
#[cfg(test)]
use datafusion_common::{ColumnStatistics, ScalarValue};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is here because you get clippy warnings otherwise, then it's marginally cleaner to remove the use statements and just fully qualify these below.

Comment on lines +193 to +198
) -> FFI_Result<FFI_Option<SVec<u8>>> {
let serialized: Option<SVec<u8>> = provider
.inner()
.statistics()
.map(|s| serialize_statistics(&s).into_iter().collect());
FFI_Result::Ok(serialized.into())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return a result? It looks infallible unless I missed something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate auto detected api change Auto detected API change ffi Changes to the ffi crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose ExecutionPlan statistics across the FFI boundary

3 participants