Skip to content

[feature](variant) add variant_flatten scalar function#62825

Open
csun5285 wants to merge 1 commit intoapache:masterfrom
csun5285:feat-variant-flatten
Open

[feature](variant) add variant_flatten scalar function#62825
csun5285 wants to merge 1 commit intoapache:masterfrom
csun5285:feat-variant-flatten

Conversation

@csun5285
Copy link
Copy Markdown
Contributor

@csun5285 csun5285 commented Apr 24, 2026

Flatten a nested VARIANT row into a single-level JSON object keyed by dot-joined paths to each leaf (NiFi FlattenJson keep-arrays semantics):

{"a":{"b":2}} -> {"a.b":2}
{"a":{"b":{"c":3}}} -> {"a.b.c":3}
{"a":[{"b":1}]} -> {"a":[{"b":1}]}

Return type is STRING so the result survives being written back to a variant column without being re-structured. Nullability propagates from the input.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Flatten a nested VARIANT row into a single-level JSON object keyed by
dot-joined paths to each leaf (NiFi FlattenJson keep-arrays semantics):

  {"a":{"b":2}}          -> {"a.b":2}
  {"a":{"b":{"c":3}}}    -> {"a.b.c":3}
  {"a":[{"b":1}]}        -> {"a":[{"b":1}]}

Return type is STRING so the result survives being written back to a
variant column without being re-structured. Nullability propagates from
the input.

The heavy lifting lives in ColumnVariant::serialize_one_row_flattened_to_string:
a single pass over the row's typed sub-columns, sparse-column paths and
doc-snapshot-column paths that emits "path":value for each leaf — no
reparse. Array-of-objects inputs in the default parser mode land as one
opaque sub-column leaf, so the flat walk naturally preserves keep-arrays
semantics. Under the legacy deprecated_enable_flatten_nested mode Variant
columnarises the array, and the flat walk emits elementwise paths (e.g.
{"a.b":[1, 2]}) — the test suite locks that documented behavior.

FE side uses PropagateNullable + computeSignature to preserve the input
VariantType's predefinedFields through signature resolution, aligning
with ElementAt's pattern.

BE unit tests build their test ColumnVariant per case (isolating each
case in its own column) to sidestep two Variant-layer interactions:
sub-column type unification across rows with heterogeneous leaf types,
and a pre-existing collision in pick_subcolumns_to_sparse_column's
std::map<string_view, Subcolumn> when two PathInData share the same
get_path() string. Regression baseline was generated via
`run-regression-test.sh -d variant_p0 -s regression_test_variant_flatten
-genOut`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 24, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@csun5285
Copy link
Copy Markdown
Contributor Author

run buildall

@csun5285
Copy link
Copy Markdown
Contributor Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24890200960

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.00% (94/100) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.49% (26810/37501)
Line Coverage 53.87% (280122/519959)
Region Coverage 47.26% (215525/456005)
Branch Coverage 50.55% (97518/192903)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 28.89% (13/45) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants