Skip to content

[python] Generate input changelogs from Python writer#7739

Open
junmuz wants to merge 4 commits intoapache:masterfrom
junmuz:adding_changelog_producer
Open

[python] Generate input changelogs from Python writer#7739
junmuz wants to merge 4 commits intoapache:masterfrom
junmuz:adding_changelog_producer

Conversation

@junmuz
Copy link
Copy Markdown
Contributor

@junmuz junmuz commented Apr 29, 2026

Purpose

  • Add input changelog producer support to the Python Paimon library, including DataChangelogWriter, ManifestCommittable, and integration into the write/commit pipeline
  • Reject changelog-producer on tables without primary keys, since changelog generation requires PK-based deduplication
  • Decouple changelog file format from data file format via a new changelog-file.format config option

Tests

There are new tests added. Python scripts have been executed manually and the generated changelogs are verified to be readable from a FlinkSQL job.

Limitation

Changelogs are currently only generated for inserts.

@junmuz junmuz force-pushed the adding_changelog_producer branch from 932d471 to d93459e Compare April 29, 2026 13:06
@junmuz junmuz changed the title Generate input changelogs from Python writer [python] Generate input changelogs from Python writer Apr 29, 2026
CHANGELOG_FILE_FORMAT: ConfigOption[str] = (
ConfigOptions.key("changelog-file.format")
.string_type()
.no_default_value()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is set as defined in the docs

Comment thread paimon-python/pypaimon/write/commit_message.py Outdated
@junmuz junmuz force-pushed the adding_changelog_producer branch from 1b1441e to 01bc7fb Compare April 29, 2026 15:10
@junmuz
Copy link
Copy Markdown
Contributor Author

junmuz commented Apr 30, 2026

@JingsongLi @XiaoHongbo-Hope The PR adds support for generating changelogs for input changelog producer table (primarily for inserts). Can you review it?

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

@JingsongLi @XiaoHongbo-Hope The PR adds support for generating changelogs for input changelog producer table (primarily for inserts). Can you review it?

sure

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

XiaoHongbo-Hope commented May 1, 2026

Could you add coverage for input changelog with row tracking enabled?

@junmuz
Copy link
Copy Markdown
Contributor Author

junmuz commented May 1, 2026

Could you add coverage for input changelog with row tracking enabled?

@XiaoHongbo-Hope Row tracking requires tables without primary keys (row-tracking.enabled is only valid on append-only tables with bucket=-1), while changelog-producer requires primary keys to be defined. These two features looks mutually exclusive by design. I don't see a valid table configuration where both can be active simultaneously. The existing test test_reject_changelog_producer_on_append_only_table already verifies that we reject changelog-producer on tables without primary keys, which should cover all row-tracking tables.

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

Could you add coverage for input changelog with row tracking enabled?

@XiaoHongbo-Hope Row tracking requires tables without primary keys (row-tracking.enabled is only valid on append-only tables with bucket=-1), while changelog-producer requires primary keys to be defined. These two features looks mutually exclusive by design. I don't see a valid table configuration where both can be active simultaneously. The existing test test_reject_changelog_producer_on_append_only_table already verifies that we reject changelog-producer on tables without primary keys, which should cover all row-tracking tables.

This validation can still be bypassed because it only runs in Schema.from_pyarrow_schema(). Directly constructing Schema(fields=..., primary_keys=[], options={'changelog-producer': 'input'}) and passing it to catalog.create_table() still creates an append-only table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants