fix: propagate inner-field metadata through make_array and array_agg#22176
Open
CuteChuanChuan wants to merge 3 commits into
Open
fix: propagate inner-field metadata through make_array and array_agg#22176CuteChuanChuan wants to merge 3 commits into
CuteChuanChuan wants to merge 3 commits into
Conversation
- Add `nullable_inner_field_from` / `nullable_list_item_field_from` helpers in `datafusion-common::utils` (built on `FieldExt::renamed`). - Extend `SingleRowListArrayBuilder::with_field` to also propagate metadata. - `make_array`: add `return_field_from_args`; thread inner `FieldRef` through new `array_array_with_field` runtime variant. - `array_agg`: add `return_field` and `state_fields` overrides; all four accumulators (`ArrayAgg`, `Distinct`, `OrderSensitive`, `Groups`) now carry `FieldRef` instead of `DataType`, propagating metadata. - Add SLT `array_metadata_propagation.slt` covering `make_array` and `array_agg`. - Update memory-accounting tests for new struct layout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
make_arrayandarray_aggRationale for this change
SQL UDFs/UDAFs that wrap a column into a composite type (
List,Struct,Map, ...) currently drop the input field's metadata when building the output's innerField.This breaks Arrow extension types (
ARROW:extension:*): SQL-constructed lists silently lose extension-type identity, and any downstream comparison sees them as different types.The fix needs both a planning-time hook (
return_field_from_args/return_field) that propagates metadata, and runtime construction paths that thread the plannedFieldRefthrough to the produced array.What changes are included in this PR?
Helpers (
datafusion-common::utils)nullable_inner_field_from/nullable_list_item_field_from: build canonical composite inner fields, preserving data type + metadata. Built onFieldExt::renamed, takingFieldRefby value so callers control cloning.SingleRowListArrayBuilder::with_fieldnow propagates metadata too.make_arrayreturn_field_from_args.array_array_with_fieldruntime variant; oldarray_arraybecomes a thin shim for back-compat.array_aggreturn_fieldandstate_fields.FieldRefinstead ofDataType.Are these changes tested?
Yes
array_metadata_propagation.sltcoversmake_arrayandarray_aggAre there any user-facing changes?
Yes
make_arrayandarray_aggnow retain Arrow extension-type identity from their input fields.Public API change :
ArrayAggAccumulator::try_new,DistinctArrayAggAccumulator::try_new,OrderSensitiveArrayAggAccumulator::try_new, andArrayAggGroupsAccumulator::newnow take&FieldRefinstead of&DataType/ ownedDataType.