Skip to content

[AURON #2321] Support Iceberg column rename and drop-then-add in the native scan#2322

Open
lyne7-sc wants to merge 4 commits into
apache:masterfrom
lyne7-sc:fix/iceberg_rename
Open

[AURON #2321] Support Iceberg column rename and drop-then-add in the native scan#2322
lyne7-sc wants to merge 4 commits into
apache:masterfrom
lyne7-sc:fix/iceberg_rename

Conversation

@lyne7-sc

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2321

Rationale for this change

The native Iceberg scan matches data-file columns by name, but Iceberg tracks them by field-id. After a column rename, old files read as all-NULL; after a drop-then-add of the same name, the new column reads the old column's data.

What changes are included in this PR?

Resolve columns by Iceberg field-id instead of by name:

  • proto: add field_id to Field.
  • JVM (AuronIcebergSourceUtil, IcebergScanSupport, NativeConverters): extract top-level name → field-id from the scan's expectedSchema() and serialize it into the plan.
  • native (auron-planner, scan/mod.rs): stamp the id into Arrow field metadata (PARQUET:field_id); fields_match matches by id when present, else falls back to case-insensitive name matching (non-Iceberg scans unchanged).

Nested-struct evolution and ORC rename/drop fall back to Spark, additive evolution stays native.

Are there any user-facing changes?

Yes. Iceberg queries on renamed or drop-then-added columns now return correct results under the native scan. Unsupported cases fall back to Spark. No API change.

How was this patch tested?

Added cases to AuronIcebergIntegrationSuite

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native Iceberg scan returns wrong data after column rename / drop-then-add

2 participants