Skip to content

fix(concurrency): guard 20+ race conditions across getOrCreate, balance updates, and view permissions#2845

Open
hongwei1 wants to merge 20 commits into
OpenBankProject:developfrom
hongwei1:feature/concurrency-hazard-tests
Open

fix(concurrency): guard 20+ race conditions across getOrCreate, balance updates, and view permissions#2845
hongwei1 wants to merge 20 commits into
OpenBankProject:developfrom
hongwei1:feature/concurrency-hazard-tests

Conversation

@hongwei1

@hongwei1 hongwei1 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

This branch fixes 20+ concurrency race conditions identified through systematic hazard analysis and concurrency test suite execution. All 20 scenarios in CONCURRENCY_HAZARDS.md are now in the Gracefully handled tier.

Core fixes

  • Atomic balance updates — replace Lift Mapper read-modify-write with Doobie SELECT FOR UPDATE row-level lock (DoobieBankAccountQueries)
  • MFA double-spend preventionlockTransactionRequest now uses runUpdate (Strategy.default) instead of runQuery (Strategy.void) so the FOR UPDATE lock is held through the transaction on the fallback path
  • getOrCreate races — wrap all saveMe() in tryo + re-fetch on Failure across 9 providers: OAuth Consumer, SystemView, ViewPermission, AccountHolder, Entitlement, mapping providers (Customer / Transaction / Account), and BadLoginStatus
  • Scheduler stale-save — consent scheduler uses conditional UPDATE WHERE mStatus = guardStatus; removed direct mStatus(...).saveMe() that could resurrect revoked consents
  • View permission races — guard addPermission, addPermissions, revokePermission, and migrateViewPermissions against duplicate inserts; fix removeAllViewsAndVierPermissions to explicitly delete AccountAccess before bulkDelete_!! (which bypasses beforeDelete hooks)
  • Future counter atomicityincrementFutureCounter / decrementFutureCounter use UPDATE ... SET n = n ± 1 instead of read-increment-write

Post-review correctness fixes (5 additional)

  • lockTransactionRequest: runQueryrunUpdate (critical: lock was released before payment logic ran on non-request-scoped paths)
  • OAuth getOrCreateConsumer re-fetch: use (azp, sub) match; fall back to key when either is absent to avoid matching unrelated consumers via empty-string
  • removeAllViewsAndVierPermissions: explicit AccountAccess.bulkDelete_!! before ViewDefinition.bulkDelete_!!
  • LoginAttempts.incrementBadLoginAttempts: wrap first-time create.save in tryo
  • migrate_00000020.sql: dedup prerequisite for new UniqueIndex additions on mapperaccountholder and mappedentitlement

Codebase-wide audit (4 additional mapping provider fixes)

Full scan of the codebase revealed the same unguarded getOrCreate pattern in three ID-mapping providers and getOrCreateBadLoginStatus. All four fixed with the same tryo + re-fetch pattern.

Test plan

  • All 20 scenarios in obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md pass
  • ConcurrentViewPermissionRaceTest — 4 scenarios (N, O, R, migrate)
  • CI shards 1–4 green
  • Run migrate_00000020.sql against any DB with pre-existing data before deploying

hongwei1 added 18 commits June 11, 2026 15:25
Adds a tagged ScalaTest suite that simulates 16 confirmed database
concurrency hazards in OBP-API: lost-update, check-then-act,
check-then-insert, unique-constraint-unhandled, and counter-sequence
races across money movement, security, consent scheduling, view
permissions, and OAuth user creation paths.

Each scenario asserts the theoretically-correct outcome so that a
hazard surfaces as a FAILED test (red bar = evidence the hazard is
real). Two safeguard scenarios (connection pool back-pressure,
per-request context isolation) are verified to PASS.

All scenarios are tagged ConcurrencyRace and excluded from the CI
main flow. See CONCURRENCY_HAZARDS.md for the full taxonomy, source
locations, and three-tier protection analysis.

Run only these tests:
  mvn -pl obp-commons,obp-api scalatest:test \
    -DtagsToInclude=code.concurrency.ConcurrencyRace \
    -DfailIfNoTests=false
…vel locking

Replace non-atomic read-modify-write counter updates in LoginAttempt.incrementBadLoginAttempts
and MappedChallengeProvider.validateChallenge with atomic Doobie SQL operations using
SELECT FOR UPDATE + UPDATE to serialize concurrent access at the database level.

Previously, concurrent requests all read the same starting counter value and each wrote
back an identical incremented value (lost-update), allowing attackers to:
- Send concurrent bad-login attempts without consuming the lockout threshold (Scenario H)
- Submit concurrent wrong challenge answers without exhausting the allowed-attempt budget (Scenario K)

The SELECT FOR UPDATE acquires an exclusive row lock before the UPDATE, forcing subsequent
transactions to wait and then re-read the current committed counter value rather than
operating on their snapshot value. Verified against H2 (MVCC/MVStore) in concurrency tests.
…consents

Replace unconditional .save calls in ConsentScheduler with conditional Doobie
UPDATEs that guard on the expected current status. If the status was changed
by a concurrent HTTP request between the scheduler's findAll and its update,
the WHERE clause does not match and the stale write is silently skipped (0 rows
updated), preserving the terminal status committed by the HTTP path.

Affected scheduler tasks:
- expiredBerlinGroupConsents: guards on mstatus IN ('valid','VALID')
- unfinishedBerlinGroupConsents: guards on mstatus = 'received'
- expiredObpConsents: guards on mstatus = 'ACCEPTED'

Fixes ConcurrentConsentRaceTest scenarios J and U.
… methods

Add UniqueIndex constraints where missing and wrap saveMe() calls in
scala.util.Try or tryo, re-fetching the committed row on constraint
violation so concurrent callers always get a usable result rather than
an unhandled JdbcSQLIntegrityConstraintViolationException.

Affected methods:
- MappedEntitlement.addEntitlement (C): UniqueIndex(bankId,userId,roleName) + tryo retry
- MapperAccountHolders.getOrCreateAccountHolder (D): UniqueIndex(user,bankPermalink,accPermalink) + tryo retry
- MapperCounterparties.getOrCreateMetadata (F): tryo retry on existing UniqueIndex(counterpartyId)
- OAuth.getOrCreateConsumer (W): re-fetch on Failure from existing tryo block
- MappedUserCustomerLink.getOCreateUserCustomerLink (L): Try retry on existing UniqueIndex(userId,customerId)
- LiftUsers.getOrCreateUserByProviderId (I): Try retry on existing UniqueIndex(provider,providerId)
…ccess rows

N: getOrCreateCustomPublicView — wrap createDefaultCustomPublicView in
   scala.util.Try; on constraint violation re-fetch the committed row.
   Prevents uncaught JdbcSQLIntegrityConstraintViolationException when two
   concurrent callers both see Empty and both try to insert the _public view.

O: resetViewPermissions — wrap ViewPermission.create.save in scala.util.Try
   so a concurrent reset that already inserted the same (bank,account,view,
   permission) row is handled gracefully instead of throwing.

R: ViewDefinition.beforeDelete cascade — add a beforeDelete hook that deletes
   all AccountAccess rows for the view before the view row is removed. Closes
   the TOCTOU window in removeCustomView's empty-check-then-delete sequence:
   any AccountAccess granted in that window is cleaned up atomically with the
   view deletion rather than orphaned.
…ter atomic

Replace the non-atomic getOrDefault + put pair with ConcurrentHashMap.compute,
which holds the internal segment lock for the duration of the read-modify-write.
This prevents concurrent callers from reading the same stale tuple and
overwriting each other's increments.
…ons against duplicate inserts

- MapperViews.getOrCreateSystemView: wrap createDefaultSystemView in scala.util.Try;
  re-fetch the committed row on constraint violation (mirrors the N fix for custom views)
- MapperViews.migrateViewPermissions: wrap the four bare ViewPermission.create.save call
  sites (system-view and custom-view branches for both boolean and list-valued permissions)
  in scala.util.Try; duplicate inserts from concurrent migrations are silently absorbed
- ConcurrentViewPermissionRaceTest: add 'migrate' scenario — two threads concurrently
  call migrateViewPermissions on a fresh custom view; asserts no exception and exactly
  one ViewPermission row per enabled permission
- CONCURRENCY_HAZARDS.md: update to 20 scenarios; document M/P/migrateViewPermissions
  as fixed; add migrate row to the View Permissions results table
- DoobieTransactionRequestQueries: use runUpdate instead of runQuery for
  lockTransactionRequest so SELECT FOR UPDATE holds the row lock on the
  fallback (non-request-context) path; runQuery's Strategy.void transactor
  released the lock before the caller's payment logic could run

- OAuth.getOrCreateConsumer: re-fetch after UniqueIndex failure now uses
  (azp, sub) only when both are present; when either is absent falls back
  to re-fetch by key to avoid matching an unrelated consumer stored with
  empty azp+sub fields

- MapperViews.removeAllViewsAndVierPermissions: explicitly delete
  AccountAccess rows before ViewDefinition.bulkDelete_!! because
  bulkDelete_!! bypasses beforeDelete hooks, leaving orphaned rows

- LoginAttempts.incrementBadLoginAttempts: wrap the first-time
  MappedBadLoginAttempt create in tryo so concurrent first-time bad
  logins don't propagate an uncaught JDBC UniqueIndex exception

- Add migrate_00000020.sql: dedup mapperaccountholder and mappedentitlement
  rows before the new UniqueIndex additions can be applied by Schemifier;
  without prior dedup Schemifier's CREATE UNIQUE INDEX aborts on any
  production DB that accumulated duplicate rows before this fix
…n status against duplicate inserts

Three ID-mapping providers (Customer, Transaction, Account) and
getOrCreateBadLoginStatus in LoginAttempts had the same check-then-insert
race: two concurrent callers both saw Empty, both called saveMe(), and
the losing thread threw an uncaught UniqueIndex violation.

Apply the standard tryo + re-fetch pattern to all four methods:
- wrap saveMe in tryo to absorb the UniqueIndex violation
- on Failure, re-fetch by the natural key to return the committed row
- getOrCreateBadLoginStatus: replace eager .or(Full(saveMe())) with
  an explicit match so saveMe is only called when find returns non-Full
@hongwei1 hongwei1 changed the title Feature/concurrency hazard tests fix(concurrency): guard 20+ race conditions across getOrCreate, balance updates, and view permissions Jun 23, 2026
… providers

- MappedCustomerIdMappingProvider: `case other => other.map(_.customerId)`
  fixes Box[MappedCustomerIdMapping] returned where Box[CustomerId] required
- MappedTransactionIdMappingProvider: same fix → `.map(_.transactionId)`
- MappedAccountIdMappingProvider: same fix → `.map(_.accountId)`
- LoginAttempts: add `Failure` to the net.liftweb.common import so the
  inner tryo match arm compiles (was: not found: value Failure)
@hongwei1 hongwei1 closed this Jun 23, 2026
@hongwei1 hongwei1 reopened this Jun 23, 2026
Replace opaque (c, o) tuple destructuring with (totalCallCount, openFuturesCount)
in incrementFutureCounter and decrementFutureCounter. Add inline comments explaining
the atomicity guarantee from ConcurrentHashMap.compute, the role of each counter, and
the null-guard initialisation behaviour in the decrement path.
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant