[python] Generate input changelogs from Python writer#7739
[python] Generate input changelogs from Python writer#7739junmuz wants to merge 4 commits intoapache:masterfrom
Conversation
932d471 to
d93459e
Compare
| CHANGELOG_FILE_FORMAT: ConfigOption[str] = ( | ||
| ConfigOptions.key("changelog-file.format") | ||
| .string_type() | ||
| .no_default_value() |
1b1441e to
01bc7fb
Compare
|
@JingsongLi @XiaoHongbo-Hope The PR adds support for generating changelogs for input changelog producer table (primarily for inserts). Can you review it? |
sure |
|
Could you add coverage for input changelog with row tracking enabled? |
@XiaoHongbo-Hope Row tracking requires tables without primary keys (row-tracking.enabled is only valid on append-only tables with bucket=-1), while changelog-producer requires primary keys to be defined. These two features looks mutually exclusive by design. I don't see a valid table configuration where both can be active simultaneously. The existing test test_reject_changelog_producer_on_append_only_table already verifies that we reject changelog-producer on tables without primary keys, which should cover all row-tracking tables. |
This validation can still be bypassed because it only runs in Schema.from_pyarrow_schema(). Directly constructing Schema(fields=..., primary_keys=[], options={'changelog-producer': 'input'}) and passing it to catalog.create_table() still creates an append-only table |
Purpose
Tests
There are new tests added. Python scripts have been executed manually and the generated changelogs are verified to be readable from a FlinkSQL job.
Limitation
Changelogs are currently only generated for inserts.