[feature](variant) add variant_flatten scalar function#62825
Open
csun5285 wants to merge 1 commit intoapache:masterfrom
Open
[feature](variant) add variant_flatten scalar function#62825csun5285 wants to merge 1 commit intoapache:masterfrom
csun5285 wants to merge 1 commit intoapache:masterfrom
Conversation
Flatten a nested VARIANT row into a single-level JSON object keyed by
dot-joined paths to each leaf (NiFi FlattenJson keep-arrays semantics):
{"a":{"b":2}} -> {"a.b":2}
{"a":{"b":{"c":3}}} -> {"a.b.c":3}
{"a":[{"b":1}]} -> {"a":[{"b":1}]}
Return type is STRING so the result survives being written back to a
variant column without being re-structured. Nullability propagates from
the input.
The heavy lifting lives in ColumnVariant::serialize_one_row_flattened_to_string:
a single pass over the row's typed sub-columns, sparse-column paths and
doc-snapshot-column paths that emits "path":value for each leaf — no
reparse. Array-of-objects inputs in the default parser mode land as one
opaque sub-column leaf, so the flat walk naturally preserves keep-arrays
semantics. Under the legacy deprecated_enable_flatten_nested mode Variant
columnarises the array, and the flat walk emits elementwise paths (e.g.
{"a.b":[1, 2]}) — the test suite locks that documented behavior.
FE side uses PropagateNullable + computeSignature to preserve the input
VariantType's predefinedFields through signature resolution, aligning
with ElementAt's pattern.
BE unit tests build their test ColumnVariant per case (isolating each
case in its own column) to sidestep two Variant-layer interactions:
sub-column type unification across rows with heterogeneous leaf types,
and a pre-existing collision in pick_subcolumns_to_sparse_column's
std::map<string_view, Subcolumn> when two PathInData share the same
get_path() string. Regression baseline was generated via
`run-regression-test.sh -d variant_p0 -s regression_test_variant_flatten
-genOut`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
Author
|
/review |
Contributor
|
OpenCode automated review failed and did not complete. Error: Review step was failure (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
FE Regression Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Flatten a nested VARIANT row into a single-level JSON object keyed by dot-joined paths to each leaf (NiFi FlattenJson keep-arrays semantics):
{"a":{"b":2}} -> {"a.b":2}
{"a":{"b":{"c":3}}} -> {"a.b.c":3}
{"a":[{"b":1}]} -> {"a":[{"b":1}]}
Return type is STRING so the result survives being written back to a variant column without being re-structured. Nullability propagates from the input.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)