Skip to content

[adapter] Detect concurrent dependency mutation in PlanValidity#37078

Open
mtabebe wants to merge 1 commit into
MaterializeInc:mainfrom
mtabebe:ma/sql-272
Open

[adapter] Detect concurrent dependency mutation in PlanValidity#37078
mtabebe wants to merge 1 commit into
MaterializeInc:mainfrom
mtabebe:ma/sql-272

Conversation

@mtabebe

@mtabebe mtabebe commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Problem:
ALTER CONNECTION ... ROTATE KEYS is a read-modify-write with a wide window between the catalog read and the catalog write. The slow steps in the middle are SSH key generation and secret-store I/O, all running in a spawned task. If ALTER CONNECTION SET (...) commits inside that window, the blind catalog write at the end of ROTATE overwrites the SET's change with the pre-SET create_sql. The SET silently disappears.

Note:
ROTATE KEYS is one of the few DDLs exempt from the global serialized_ddl lock (see must_serialize_ddl in command_handler.rs), which is what opens the race window in the first place. Other ALTER CONNECTION variants take the lock and can't race against themselves.

Solution:
PlanValidity already runs at every off-thread to on-thread hop and catches dropped dependencies. Extend it to also catch content changes: capture a hash of each dependency's create_sql at plan time, compare against the live hash in check, and return a new ConcurrentDependencyMutation error (SQLSTATE 40001) on mismatch. The user retries the statement.

PlanValidity::new and extend_dependencies now take &Catalog so they can read entries to hash; all callsites are updated.

Testing:
Regression test in test/race-condition-rotate-keys/mzcompose.py runs ROTATE and SET concurrently with a 1ms stagger. It asserts no silent revert across the loop and that the OCC check fires at least once, so the test fails loudly if the race window ever moves out from under it.

Closes SQL-272.

@mtabebe mtabebe force-pushed the ma/sql-272 branch 3 times, most recently from b389bf0 to 656cd63 Compare June 16, 2026 19:59
Problem:
`ALTER CONNECTION ... ROTATE KEYS` is a read-modify-write with a wide
window between the catalog read and the catalog write. The slow steps
in the middle are SSH key generation and secret-store I/O, all running
in a spawned task. If `ALTER CONNECTION SET (...)` commits inside that
window, the blind catalog write at the end of ROTATE overwrites the
SET's change with the pre-SET `create_sql`. The SET silently disappears.

Note:
ROTATE KEYS is one of the few DDLs exempt from the global
`serialized_ddl` lock (see `must_serialize_ddl` in `command_handler.rs`),
which is what opens the race window in the first place. Other ALTER
CONNECTION variants take the lock and can't race against themselves.

Solution:
`PlanValidity` already runs at every off-thread to on-thread hop and
catches dropped dependencies. Extend it to also catch content changes:
capture a hash of each dependency's `create_sql` at plan time, compare
against the live hash in `check`, and return a new
`ConcurrentDependencyMutation` error (SQLSTATE 40001) on mismatch. The
user retries the statement.

`PlanValidity::new` and `extend_dependencies` now take `&Catalog` so
they can read entries to hash; all callsites are updated.

Testing:
Regression test in `test/race-condition-rotate-keys/mzcompose.py` runs
ROTATE and SET concurrently with a 1ms stagger. It asserts no silent
revert across the loop and that the OCC check fires at least once, so
the test fails loudly if the race window ever moves out from under it.

Closes SQL-272.
@mtabebe mtabebe marked this pull request as ready for review June 16, 2026 22:27
@mtabebe mtabebe requested review from a team as code owners June 16, 2026 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant