fix(concurrency): guard 20+ race conditions across getOrCreate, balance updates, and view permissions#2845
Open
hongwei1 wants to merge 20 commits into
Open
Conversation
Adds a tagged ScalaTest suite that simulates 16 confirmed database
concurrency hazards in OBP-API: lost-update, check-then-act,
check-then-insert, unique-constraint-unhandled, and counter-sequence
races across money movement, security, consent scheduling, view
permissions, and OAuth user creation paths.
Each scenario asserts the theoretically-correct outcome so that a
hazard surfaces as a FAILED test (red bar = evidence the hazard is
real). Two safeguard scenarios (connection pool back-pressure,
per-request context isolation) are verified to PASS.
All scenarios are tagged ConcurrencyRace and excluded from the CI
main flow. See CONCURRENCY_HAZARDS.md for the full taxonomy, source
locations, and three-tier protection analysis.
Run only these tests:
mvn -pl obp-commons,obp-api scalatest:test \
-DtagsToInclude=code.concurrency.ConcurrencyRace \
-DfailIfNoTests=false
…into feature/concurrency-hazard-tests
…json4s in concurrency tests
…locking and fallback transactor
…lect global Http4s proxy usage
…vel locking Replace non-atomic read-modify-write counter updates in LoginAttempt.incrementBadLoginAttempts and MappedChallengeProvider.validateChallenge with atomic Doobie SQL operations using SELECT FOR UPDATE + UPDATE to serialize concurrent access at the database level. Previously, concurrent requests all read the same starting counter value and each wrote back an identical incremented value (lost-update), allowing attackers to: - Send concurrent bad-login attempts without consuming the lockout threshold (Scenario H) - Submit concurrent wrong challenge answers without exhausting the allowed-attempt budget (Scenario K) The SELECT FOR UPDATE acquires an exclusive row lock before the UPDATE, forcing subsequent transactions to wait and then re-read the current committed counter value rather than operating on their snapshot value. Verified against H2 (MVCC/MVStore) in concurrency tests.
…consents
Replace unconditional .save calls in ConsentScheduler with conditional Doobie
UPDATEs that guard on the expected current status. If the status was changed
by a concurrent HTTP request between the scheduler's findAll and its update,
the WHERE clause does not match and the stale write is silently skipped (0 rows
updated), preserving the terminal status committed by the HTTP path.
Affected scheduler tasks:
- expiredBerlinGroupConsents: guards on mstatus IN ('valid','VALID')
- unfinishedBerlinGroupConsents: guards on mstatus = 'received'
- expiredObpConsents: guards on mstatus = 'ACCEPTED'
Fixes ConcurrentConsentRaceTest scenarios J and U.
… methods Add UniqueIndex constraints where missing and wrap saveMe() calls in scala.util.Try or tryo, re-fetching the committed row on constraint violation so concurrent callers always get a usable result rather than an unhandled JdbcSQLIntegrityConstraintViolationException. Affected methods: - MappedEntitlement.addEntitlement (C): UniqueIndex(bankId,userId,roleName) + tryo retry - MapperAccountHolders.getOrCreateAccountHolder (D): UniqueIndex(user,bankPermalink,accPermalink) + tryo retry - MapperCounterparties.getOrCreateMetadata (F): tryo retry on existing UniqueIndex(counterpartyId) - OAuth.getOrCreateConsumer (W): re-fetch on Failure from existing tryo block - MappedUserCustomerLink.getOCreateUserCustomerLink (L): Try retry on existing UniqueIndex(userId,customerId) - LiftUsers.getOrCreateUserByProviderId (I): Try retry on existing UniqueIndex(provider,providerId)
…ccess rows N: getOrCreateCustomPublicView — wrap createDefaultCustomPublicView in scala.util.Try; on constraint violation re-fetch the committed row. Prevents uncaught JdbcSQLIntegrityConstraintViolationException when two concurrent callers both see Empty and both try to insert the _public view. O: resetViewPermissions — wrap ViewPermission.create.save in scala.util.Try so a concurrent reset that already inserted the same (bank,account,view, permission) row is handled gracefully instead of throwing. R: ViewDefinition.beforeDelete cascade — add a beforeDelete hook that deletes all AccountAccess rows for the view before the view row is removed. Closes the TOCTOU window in removeCustomView's empty-check-then-delete sequence: any AccountAccess granted in that window is cleaned up atomically with the view deletion rather than orphaned.
…ter atomic Replace the non-atomic getOrDefault + put pair with ConcurrentHashMap.compute, which holds the internal segment lock for the duration of the read-modify-write. This prevents concurrent callers from reading the same stale tuple and overwriting each other's increments.
…ons against duplicate inserts - MapperViews.getOrCreateSystemView: wrap createDefaultSystemView in scala.util.Try; re-fetch the committed row on constraint violation (mirrors the N fix for custom views) - MapperViews.migrateViewPermissions: wrap the four bare ViewPermission.create.save call sites (system-view and custom-view branches for both boolean and list-valued permissions) in scala.util.Try; duplicate inserts from concurrent migrations are silently absorbed - ConcurrentViewPermissionRaceTest: add 'migrate' scenario — two threads concurrently call migrateViewPermissions on a fresh custom view; asserts no exception and exactly one ViewPermission row per enabled permission - CONCURRENCY_HAZARDS.md: update to 20 scenarios; document M/P/migrateViewPermissions as fixed; add migrate row to the View Permissions results table
- DoobieTransactionRequestQueries: use runUpdate instead of runQuery for lockTransactionRequest so SELECT FOR UPDATE holds the row lock on the fallback (non-request-context) path; runQuery's Strategy.void transactor released the lock before the caller's payment logic could run - OAuth.getOrCreateConsumer: re-fetch after UniqueIndex failure now uses (azp, sub) only when both are present; when either is absent falls back to re-fetch by key to avoid matching an unrelated consumer stored with empty azp+sub fields - MapperViews.removeAllViewsAndVierPermissions: explicitly delete AccountAccess rows before ViewDefinition.bulkDelete_!! because bulkDelete_!! bypasses beforeDelete hooks, leaving orphaned rows - LoginAttempts.incrementBadLoginAttempts: wrap the first-time MappedBadLoginAttempt create in tryo so concurrent first-time bad logins don't propagate an uncaught JDBC UniqueIndex exception - Add migrate_00000020.sql: dedup mapperaccountholder and mappedentitlement rows before the new UniqueIndex additions can be applied by Schemifier; without prior dedup Schemifier's CREATE UNIQUE INDEX aborts on any production DB that accumulated duplicate rows before this fix
…n status against duplicate inserts Three ID-mapping providers (Customer, Transaction, Account) and getOrCreateBadLoginStatus in LoginAttempts had the same check-then-insert race: two concurrent callers both saw Empty, both called saveMe(), and the losing thread threw an uncaught UniqueIndex violation. Apply the standard tryo + re-fetch pattern to all four methods: - wrap saveMe in tryo to absorb the UniqueIndex violation - on Failure, re-fetch by the natural key to return the committed row - getOrCreateBadLoginStatus: replace eager .or(Full(saveMe())) with an explicit match so saveMe is only called when find returns non-Full
… providers - MappedCustomerIdMappingProvider: `case other => other.map(_.customerId)` fixes Box[MappedCustomerIdMapping] returned where Box[CustomerId] required - MappedTransactionIdMappingProvider: same fix → `.map(_.transactionId)` - MappedAccountIdMappingProvider: same fix → `.map(_.accountId)` - LoginAttempts: add `Failure` to the net.liftweb.common import so the inner tryo match arm compiles (was: not found: value Failure)
Replace opaque (c, o) tuple destructuring with (totalCallCount, openFuturesCount) in incrementFutureCounter and decrementFutureCounter. Add inline comments explaining the atomicity guarantee from ConcurrentHashMap.compute, the role of each counter, and the null-guard initialisation behaviour in the decrement path.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
This branch fixes 20+ concurrency race conditions identified through systematic hazard analysis and concurrency test suite execution. All 20 scenarios in
CONCURRENCY_HAZARDS.mdare now in the Gracefully handled tier.Core fixes
SELECT FOR UPDATErow-level lock (DoobieBankAccountQueries)lockTransactionRequestnow usesrunUpdate(Strategy.default) instead ofrunQuery(Strategy.void) so theFOR UPDATElock is held through the transaction on the fallback pathsaveMe()intryo+ re-fetch onFailureacross 9 providers: OAuth Consumer, SystemView, ViewPermission, AccountHolder, Entitlement, mapping providers (Customer / Transaction / Account), and BadLoginStatusUPDATE WHERE mStatus = guardStatus; removed directmStatus(...).saveMe()that could resurrect revoked consentsaddPermission,addPermissions,revokePermission, andmigrateViewPermissionsagainst duplicate inserts; fixremoveAllViewsAndVierPermissionsto explicitly deleteAccountAccessbeforebulkDelete_!!(which bypassesbeforeDeletehooks)incrementFutureCounter/decrementFutureCounteruseUPDATE ... SET n = n ± 1instead of read-increment-writePost-review correctness fixes (5 additional)
lockTransactionRequest:runQuery→runUpdate(critical: lock was released before payment logic ran on non-request-scoped paths)getOrCreateConsumerre-fetch: use(azp, sub)match; fall back tokeywhen either is absent to avoid matching unrelated consumers via empty-stringremoveAllViewsAndVierPermissions: explicitAccountAccess.bulkDelete_!!beforeViewDefinition.bulkDelete_!!LoginAttempts.incrementBadLoginAttempts: wrap first-timecreate.saveintryomigrate_00000020.sql: dedup prerequisite for newUniqueIndexadditions onmapperaccountholderandmappedentitlementCodebase-wide audit (4 additional mapping provider fixes)
Full scan of the codebase revealed the same unguarded
getOrCreatepattern in three ID-mapping providers andgetOrCreateBadLoginStatus. All four fixed with the sametryo+ re-fetch pattern.Test plan
obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.mdpassConcurrentViewPermissionRaceTest— 4 scenarios (N, O, R, migrate)migrate_00000020.sqlagainst any DB with pre-existing data before deploying