[adapter] Detect concurrent dependency mutation in PlanValidity#37078
Open
mtabebe wants to merge 1 commit into
Open
[adapter] Detect concurrent dependency mutation in PlanValidity#37078mtabebe wants to merge 1 commit into
mtabebe wants to merge 1 commit into
Conversation
b389bf0 to
656cd63
Compare
Problem: `ALTER CONNECTION ... ROTATE KEYS` is a read-modify-write with a wide window between the catalog read and the catalog write. The slow steps in the middle are SSH key generation and secret-store I/O, all running in a spawned task. If `ALTER CONNECTION SET (...)` commits inside that window, the blind catalog write at the end of ROTATE overwrites the SET's change with the pre-SET `create_sql`. The SET silently disappears. Note: ROTATE KEYS is one of the few DDLs exempt from the global `serialized_ddl` lock (see `must_serialize_ddl` in `command_handler.rs`), which is what opens the race window in the first place. Other ALTER CONNECTION variants take the lock and can't race against themselves. Solution: `PlanValidity` already runs at every off-thread to on-thread hop and catches dropped dependencies. Extend it to also catch content changes: capture a hash of each dependency's `create_sql` at plan time, compare against the live hash in `check`, and return a new `ConcurrentDependencyMutation` error (SQLSTATE 40001) on mismatch. The user retries the statement. `PlanValidity::new` and `extend_dependencies` now take `&Catalog` so they can read entries to hash; all callsites are updated. Testing: Regression test in `test/race-condition-rotate-keys/mzcompose.py` runs ROTATE and SET concurrently with a 1ms stagger. It asserts no silent revert across the loop and that the OCC check fires at least once, so the test fails loudly if the race window ever moves out from under it. Closes SQL-272.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem:
ALTER CONNECTION ... ROTATE KEYSis a read-modify-write with a wide window between the catalog read and the catalog write. The slow steps in the middle are SSH key generation and secret-store I/O, all running in a spawned task. IfALTER CONNECTION SET (...)commits inside that window, the blind catalog write at the end of ROTATE overwrites the SET's change with the pre-SETcreate_sql. The SET silently disappears.Note:
ROTATE KEYS is one of the few DDLs exempt from the global
serialized_ddllock (seemust_serialize_ddlincommand_handler.rs), which is what opens the race window in the first place. Other ALTER CONNECTION variants take the lock and can't race against themselves.Solution:
PlanValidityalready runs at every off-thread to on-thread hop and catches dropped dependencies. Extend it to also catch content changes: capture a hash of each dependency'screate_sqlat plan time, compare against the live hash incheck, and return a newConcurrentDependencyMutationerror (SQLSTATE 40001) on mismatch. The user retries the statement.PlanValidity::newandextend_dependenciesnow take&Catalogso they can read entries to hash; all callsites are updated.Testing:
Regression test in
test/race-condition-rotate-keys/mzcompose.pyruns ROTATE and SET concurrently with a 1ms stagger. It asserts no silent revert across the loop and that the OCC check fires at least once, so the test fails loudly if the race window ever moves out from under it.Closes SQL-272.