feat(iv): OperationRepo IV runtime — pre-HYDRATE deferral, FAIL_UNAUTHORIZED handler, IdentityVerificationService (3/6)#2625
Conversation
0caf0f3 to
aec3996
Compare
9f21c64 to
89c5d54
Compare
|
@claude review |
There was a problem hiding this comment.
My prior concern about the pre-HYDRATE deferral deadlocking when the backend omits require_ident_auth is addressed — the gate is now on isInitializedWithRemote and there's a regression test for the backend-silent case. No new bugs found on this pass, but this PR is complex and touches identity-verification/auth-sensitive queue runtime across several files, so a human review is still warranted.
Extended reasoning...
Overview
PR 3 of 6 in the Identity Verification rollout, wiring queue-level IV behavior onto OperationRepo. Changes span: (1) a pre-HYDRATE deferral at the top of getNextOps so the queue waits until remote params are fetched before dispatching ops, (2) a new FAIL_UNAUTHORIZED handler path that invalidates the JWT, fires a developer-facing handler, and re-queues ops (when IV is active), (3) removeOperationsWithoutExternalId to purge anon ops from both the in-memory queue and the persistent OperationModelStore, (4) a new IdentityVerificationService IStartableService that subscribes to ConfigModelStore HYDRATE events and schedules purge + wake via suspendifyOnIO + awaitInitialized to fix a cold-start race, and (5) outer/inner two-layer gating on IdentityVerificationGates.newCodePathsRun and ivBehaviorActive. 11 new unit tests plus a regression test for the backend-silent deadlock case I previously flagged.
Security risks
This PR is squarely in auth-sensitive territory: it changes when and how operations are dispatched based on JWT/IV state, adds a handler for 401 Unauthorized responses, and introduces anonymous-op purging. The handler-fires-before-waiter-wake ordering in handleFailUnauthorized is load-bearing for correctness of enqueueAndWait callers. The two-layer gating (newCodePathsRun outer + ivBehaviorActive inner) is a risk-management mechanism for staged rollout — if either gate is miscomputed, Phase 1/3 users could see IV behavior they shouldn't. The pre-HYDRATE deferral is now correctly gated on isInitializedWithRemote rather than the IV param itself, so no deadlock on silent-backend responses.
Level of scrutiny
High. This touches the core operation dispatch path that every network-bound SDK action flows through (push subscription creates, tag updates, logins, outcomes, etc.). Regressions here could silently stall the entire queue for an app. The IV-specific logic is also directly on the auth/identity path. My prior bug report was the kind of issue that would have slipped past the existing test suite (an UNKNOWN test existed but only asserted forceExecuteOperations was called, not that any op actually dispatched). The new post-HYDRATE with backend silent on require_ident_auth test plus the semantic shift to isInitializedWithRemote closes that hole properly.
Other factors
The PR description is thorough and accurate — it correctly calls out the rationale for taking JwtTokenStore directly rather than abstracting behind an IJwtStore interface, and it explains the layering/Phase matrix clearly. Test coverage is strong for the new paths (6 OperationRepoTests, 5 IdentityVerificationServiceTests). The two pre-existing SDKInitTests failures noted as unrelated have been consistently failing on the integration branch across prior PRs. A code owner (nan-li) is the one who pinged me to re-review, signaling they're already engaged. Given the scope, complexity, cross-PR coupling, and the security-sensitive nature of the surface area, this is exactly the kind of change that benefits from a human code owner's review rather than bot approval.
3c2e5f9 to
e5a77f9
Compare
0ef3936 to
92e1365
Compare
5d8f464 to
f51c9f2
Compare
…HORIZED handler, IdentityVerificationService PR 3 of 6 against the Identity Verification integration branch — wires queue- level IV behavior on top of PR 2's `Operation.externalId` refactor. Interface additions on `IOperationRepo`: - `setJwtInvalidatedHandler((externalId: String) -> Unit)?)` — single-handler setter; PR 5's UserManager will wire in a lambda that fires the public listener EventProducer. - `removeOperationsWithoutExternalId()` — drops anon ops from both in-memory queue and the persistent store, so they don't resurrect on cold start. `OperationRepo`: - Injects `JwtTokenStore` (no cross-package-interface abstraction since core already imports user-layer classes elsewhere — 34 existing imports make the layering boundary organizational, not architectural). - Pre-HYDRATE deferral at the top of `getNextOps`: when `useIdentityVerification == JwtRequirement.UNKNOWN`, return null so the loop waits. Applies unconditionally because we can't tell Phase 1 / 2 / 3 apart pre-HYDRATE; brief first-launch delay for non-IV users is tolerable (typically <2s, cached for subsequent launches). - Outer `newCodePathsRun` dispatch at `getNextOps` filter and the FAIL_UNAUTHORIZED executor-result branch. IV-specific logic lives in a new `OperationRepoIvExtensions.kt`. `OperationRepoIvExtensions.kt` (side-by-side with OperationRepo per the PR plan): - `hasValidJwtIfRequired(jwtTokenStore, op)`: inner `ivBehaviorActive` check short-circuits to `true` when IV is inactive (Phase 3 runs the new paths without actual JWT gating). Otherwise checks `requiresJwt`, then presence of a JWT for the op's `externalId`. - `handleFailUnauthorized(startingOp, ops, jwtTokenStore, handler)`: when IV is active and the op has an externalId, invalidates the JWT, fires the handler, wakes waiters with `false`, and re-queues ops without their waiters (so they can retry once a fresh JWT arrives). Returns `false` when IV inactive or op is anon, so the base class falls through to the default drop-on-fail path. Handler fires BEFORE waiters wake so `enqueueAndWait` callers can't return before the handler propagates. New class `IdentityVerificationService`: - Subscribes to `ConfigModelStore`; reacts only to HYDRATE. - Schedules the purge via `suspendifyOnIO` + `awaitInitialized()` — fix for an earlier race where anon-op purge ran against an empty in-memory queue on cold start because it fired before `loadSavedOperations` completed. - When the new state is `JwtRequirement.REQUIRED`, purges anon ops. Always calls `forceExecuteOperations` to wake the queue from the pre-HYDRATE deferral. Tests (+11): - 5 `IdentityVerificationServiceTests`: subscribe behavior, HYDRATE with each requirement (REQUIRED purges + wakes, NOT_REQUIRED wakes only, UNKNOWN wakes only), non-HYDRATE tag ignored. `coVerifyOrder` asserts purge runs after `awaitInitialized()` — the race-fix invariant. - 6 new OperationRepoTests: pre-HYDRATE deferral (op not executed, queue preserved); flip-to-NOT_REQUIRED unblocks; `removeOperationsWithoutExternalId` filters queue and persists removal; `setJwtInvalidatedHandler` round-trip; FAIL_UNAUTHORIZED with IV active invalidates JWT + fires handler + re-queues; FAIL_UNAUTHORIZED with IV inactive falls through to default drop. Existing 791 tests still pass (2 pre-existing `SDKInitTests` failures on the integration branch remain; unrelated).
… UNKNOWN Bot caught a regression: deferral keyed on `useIdentityVerification == UNKNOWN` permanently stalls the queue whenever the backend response omits `require_ident_auth` (older deployments, dev/test environments, partial rollouts — the `TODO: New` comment in ParamsBackendService explicitly flags this field as newly-added). HYDRATE flips `isInitializedWithRemote` to true but leaves `useIdentityVerification` at UNKNOWN; `getNextOps` re-defers on every wake and ops never dispatch. Switch the gate to `!isInitializedWithRemote` — set unconditionally by `ConfigModelStoreListener` on any successful HYDRATE, so it correctly signals "HYDRATE has completed" regardless of which keys the backend returned. Post- HYDRATE with a silent backend, `useIdentityVerification` stays UNKNOWN and `ivBehaviorActive` is false, so ops flow through the old code path normally (no JWT gating applied, matching Phase 1 behavior). Tests: - Updated both existing pre-HYDRATE tests to flip `isInitializedWithRemote`. - New regression-guard test: post-HYDRATE with backend silent on the field (isInitializedWithRemote=true, useIdentityVerification=UNKNOWN) dispatches ops instead of deadlocking. - `MockHelper.configModelStore` defaults (in the core-module test helpers) now also set `isInitializedWithRemote=true` so tests that don't exercise pre-HYDRATE aren't accidentally held up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nService to OperationRepo IdentityVerificationService was orchestrating OperationRepo's lifecycle (suspendifyOnIO, awaitInitialized, conditional purge, forceExecute) — that's queue-state work, not service-of-config work. Add IOperationRepo.onJwtConfigHydrated(ivRequired: Boolean) so the service becomes a thin notifier and OperationRepo owns its own post-HYDRATE maintenance. Body lives in OperationRepoIvExtensions.kt so the extension file deletes cleanly on rollout removal; the override on OperationRepo is a one-line delegate. Interface takes Boolean rather than JwtRequirement so IOperationRepo doesn't import an internal IV type (the dispatch only cares about REQUIRED-vs-not anyway). Service tests trim to verifying the forwarding contract; orchestration coverage moves to OperationRepoTests using the existing spyk pattern.
f51c9f2 to
12bb934
Compare
Two correctness changes plus small cleanup: 1. Switch the pre-HYDRATE deferral gate from `isInitializedWithRemote` to `useIdentityVerification == UNKNOWN`. The former is wrong on upgrade: pre-IV SDKs persisted it as `true` without ever reading `jwt_required`, so cached config from an old-SDK upgrade looks "initialized" while `useIdentityVerification` is still UNKNOWN and ops would dispatch before we know whether IV is required. The original deadlock concern that motivated `isInitializedWithRemote` was hypothetical — `require_ident_auth` was never sent by the backend, so there's no installed base whose successful HYDRATE leaves the field UNKNOWN. The IV-aware backend ships alongside this SDK with `jwt_required` populated. 2. Rename the remote-params field from `require_ident_auth` to `jwt_required`, matching the IV-aware backend. 3. Inline `onJwtConfigHydratedIv` into the OperationRepo override instead of bouncing through the extension. Removes one indirection; the body just sequences existing OperationRepo methods. 4. Tests updated to gate on `useIdentityVerification == UNKNOWN`. The "no deadlock when backend silent" test is dropped — the scenario it protected against doesn't occur with the IV-aware backend. 5. handleFailUnauthorized: switch from try/catch to runCatching; downgrade the "JWT invalidated" log from warn to info; include externalId in the handler-failure log. 6. processQueueForever: include a queue snapshot in the per-iteration debug log so it's clearer what was visible to getNextOps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5edd8f8 to
9b83d03
Compare
5cdec42 to
021e0d9
Compare
It means it is turned off
2887db9 to
2227767
Compare
|
@claude review |
…onService Drop the object singleton and the duplicated featureFlagOn field. The service now derives newCodePathsRun/ivBehaviorActive on read from injected IFeatureManager and ConfigModelStore, so FeatureManager is the single source of truth for the SDK rollout flag. OperationRepo registers a HYDRATE handler in start() (same pattern as setJwtInvalidatedHandler) to break the would-be DI cycle; onJwtConfigHydrated moves off IOperationRepo and becomes an internal fun on the concrete class.
Move _operationModelStore.add inside synchronized(queue) so the queue add and store add are atomic w.r.t. the IO-side purge. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2227767 to
09c9246
Compare
| val ops = getNextOps(executeBucket) | ||
| Logging.debug("processQueueForever:ops:\n$ops") | ||
| val queueSnapshotForLogging = synchronized(queue) { queue.toList() } | ||
| Logging.debug("processQueueForever:ops:\n$ops\nqueue(${queueSnapshotForLogging.size}):\n$queueSnapshotForLogging") | ||
|
|
There was a problem hiding this comment.
🟡 OperationRepo.kt:245-246 unconditionally calls synchronized(queue) { queue.toList() } and interpolates the full queue every iteration of processQueueForever, regardless of log level. Logging.debug(message: String) is eager (level filter is in logToLogcat at Logging.kt:142, downstream of message construction), so in production (default logLevel = WARN, Logging.kt:59) this pays for a queue-lock acquisition + shallow copy + N Operation.toJSON().toString() calls per loop iteration for output that's immediately discarded. Wrap in if (Logging.atLogLevel(LogLevel.DEBUG)) { ... } (the existing idiom, used at WebViewManager.kt:459) to make this pay-for-what-you-use.
Extended reasoning...
Bug
The pre-PR debug line was Logging.debug("processQueueForever:ops:\n$ops") — interpolation of just the ops batch about to execute. This PR adds a second piece of work on top:
val queueSnapshotForLogging = synchronized(queue) { queue.toList() }
Logging.debug("processQueueForever:ops:\n$ops\nqueue(${queueSnapshotForLogging.size}):\n$queueSnapshotForLogging")Three new costs are paid on every loop iteration regardless of log level:
synchronized(queue)— acquires the queue lock, contending withinternalEnqueue(line 218),getNextOps(line 502), and the newremoveOperationsWithoutExternalId(line 274).queue.toList()— shallow copy of the entire in-memory queue."$queueSnapshotForLogging"string template — callsList<OperationQueueItem>.toString(), which callsOperationQueueItem.toString()(OperationRepo.kt:50-52), which interpolates$operation→Operation.toString()(Operation.kt:77-79:return toJSON().toString()) — i.e. a JSON serialization per queued op.
Why existing code does not prevent it
Logging.debug(message: String, ...) (Logging.kt:75-81) takes a String — Kotlin builds the template at the call site before debug() is even entered. The level filter sits inside logToLogcat at Logging.kt:142, after message construction. There is no lazy-lambda overload. Default logLevel = LogLevel.WARN (Logging.kt:59), so DEBUG is filtered in production and all the work above is wasted.
The codebase already has the right idiom: WebViewManager.kt:459 uses if (Logging.atLogLevel(LogLevel.DEBUG)) { ... } around expensive log-message construction. Logging.atLogLevel(level) (Logging.kt:65) is the team-sanctioned guard.
Step-by-step proof
- Production app starts with default
logLevel = WARN. processQueueForeverruns its main loop. Each iteration reaches line 245.- Kotlin evaluates the right-hand side of
val queueSnapshotForLogging = ...: enterssynchronized(queue), callsqueue.toList(), exits. - Kotlin evaluates the string template on line 246: calls
queueSnapshotForLogging.toString()→OperationQueueItem.toString()per element →Operation.toString()per element →toJSON().toString()per element. N JSON serializations. - The fully-built String is passed to
Logging.debug(...). Logging.debugcallslog()→logToLogcat()(Logging.kt:142), which checkslevel.compareTo(logLevel) >= 0— DEBUG (0) < WARN (3), filter rejects, message is discarded.
All the work in steps 3–4 is for output nothing reads.
Impact
Bounded but real. processQueueForever is event-driven (parks in waitForNewOperationAndExecutionInterval when idle), so it only iterates per enqueue/wakeup, and real-world queue sizes are small. No correctness issue. But this is a clean PR-introduced regression of constant per-iteration cost — pre-PR only stringified the executing batch (ops), this PR adds a full-queue snapshot on top.
Fix
val ops = getNextOps(executeBucket)
if (Logging.atLogLevel(LogLevel.DEBUG)) {
val queueSnapshotForLogging = synchronized(queue) { queue.toList() }
Logging.debug("processQueueForever:ops:\n$ops\nqueue(${queueSnapshotForLogging.size}):\n$queueSnapshotForLogging")
}This matches WebViewManager.kt:459 and makes the cost pay-for-what-you-use.
| get() = featureManager.isEnabled(FeatureFlag.SDK_IDENTITY_VERIFICATION) || ivBehaviorActive | ||
|
|
||
| @Volatile | ||
| private var _onJwtConfigHydrated: ((ivRequired: Boolean) -> Unit)? = null |
There was a problem hiding this comment.
make a that val and try to use a sync block instead.
| anonymous.forEach { it.waiter?.wake(false) } | ||
| queue.removeAll(anonymous) | ||
| Logging.debug("OperationRepo: removeOperationsWithoutExternalId removed ${anonymous.size} of ${anonymous.size + queue.size} operations") | ||
| anonymous.map { it.operation.id } |
| ) | ||
| if (!handled) { | ||
| // IV inactive or anon op: drop and wake waiters, matching FAIL_NORETRY. | ||
| Logging.warn("Operation execution failed without retry: $operations") |
There was a problem hiding this comment.
can we create a resuable method/extension method instead of duplicating code?
| * developer-facing [com.onesignal.user.internal.jwt.IUserJwtInvalidatedListener]s | ||
| * (single handler — not multi-subscriber). | ||
| */ | ||
| fun setJwtInvalidatedHandler(handler: ((externalId: String) -> Unit)?) |
There was a problem hiding this comment.
lets check if there is a better way to do this.
Description
One Line Summary
PR 3 of 6 against the Identity Verification integration branch — wires queue-level IV behavior (pre-HYDRATE deferral, FAIL_UNAUTHORIZED handler, anonymous-op purge) on top of PR 2's
Operation.externalIdbase-class refactor.Details
Motivation
PR 1 landed the gate state (
IdentityVerificationGates) and JWT store; PR 2 gave every operation anexternalId. This PR is where IV starts to actually do something at runtime. It lands the three behavioral pieces ofOperationRepo+IdentityVerificationServicethat are necessary before the public API can be added:jwt_requiredis true or false, otherwise a Phase 2 user's first-launch ops would race past the JWT gate and 401.FAIL_UNAUTHORIZEDhandling — without this, a 401 drops the op and the developer has no way to recover; the SDK needs to notify the app so it can supply a fresh JWT, then retry.Scope
IOperationRepogains two methods (both used only by internal SDK components — external callers don't see these change on the public surface):fun setJwtInvalidatedHandler(handler: ((externalId: String) -> Unit)?)— single-handler setter, set once by PR 5'sUserManagerwith a lambda that fires the developer-facing listenerEventProducer. Intentionally not anEventProducerhere: there's exactly one internal subscriber.fun removeOperationsWithoutExternalId()— drops anon ops from both the in-memory queue and the persistentOperationModelStoreso they don't resurrect on cold start.OperationRepo:JwtTokenStoredirectly. We considered a narrowIJwtStoreinterface defined in core (implemented by user-layerJwtTokenStore) to preserve core→user layering, but foundOperationRepoalready imports user-layer classes directly (e.g.LoginUserOperation,NewRecordsState), and 34 core files import from user overall. The core/user split is organizational, not architectural, so adding an abstraction for this one case would be inconsistent with the codebase.getNextOps:useIdentityVerificationis cached inConfigModelafter the first successful params fetch so subsequent launches skip this.newCodePathsRundispatch at thegetNextOpsfilter and theFAIL_UNAUTHORIZEDexecutor-result branch. Ops on the old code path (feature flag off,jwt_required=false) are unchanged. SplitFAIL_UNAUTHORIZEDout of the combinedFAIL_NORETRY/FAIL_CONFLICTbranch because its handling is now conditional.New
OperationRepoIvExtensions.kt(side-by-side withOperationRepo.ktper the PR plan's Option A file layout):hasValidJwtIfRequired(jwtTokenStore, op)— innerivBehaviorActiveguard short-circuits totruewhen IV behavior is off, so Phase 3 users (new code paths on, IV off) run through the dispatch site but no actual JWT gating happens. Returnsfalsefor anonymous ops when IV is active; otherwise checksjwtTokenStore.getJwt(op.externalId) != null.handleFailUnauthorized(startingOp, ops, jwtTokenStore, handler)— same inner gate. Returnsfalsewhen IV inactive or op is anon so the caller falls through to the default drop-on-fail path. When handled: invalidates the JWT, fires the handler before waking waiters (otherwise anenqueueAndWaitcaller could return before the handler propagates to the app), wakes waiters withfalse, and re-queues ops without their waiters so they can retry once a fresh JWT arrives.Both extensions take their dependencies as parameters rather than accessing
OperationRepoprivate fields — keeps encapsulation intact and the extensions pure-ish.New
IdentityVerificationService(DI-registeredIStartableService):ConfigModelStore; reacts only to HYDRATE.suspendifyOnIO { operationRepo.awaitInitialized(); ... }— fix for an earlier race where anon-op purge ran against an empty in-memory queue on cold start because it fired beforeloadSavedOperationscompleted.useIdentityVerificationisREQUIRED. Always callsforceExecuteOperationsto wake the queue from the pre-HYDRATE deferral regardless of the new state.CoreModuleregistersIdentityVerificationServiceas anIStartableService.Two-layer gating
The memory-plan design is preserved:
OperationRepo.kt):if (IdentityVerificationGates.newCodePathsRun) extensionFunction(...). When off, old code path runs untouched.if (!IdentityVerificationGates.ivBehaviorActive) return <no-op>. Lets Phase 3 users (new code paths on, IV off) run through the new dispatch without any actual IV gating applied — this is how we validate the new code paths structurally before flipping IV active for each customer.What is NOT in this PR
OneSignal.login(externalId, jwt),updateUserJwt, JWT-invalidated listeners) — PR 4.Authorization: Bearerheader attachment — PR 5.JwtTokenStore.pruneToExternalIds()invocation on cold start — deferred; not needed yet.Testing
Unit testing
11 new tests, 789 existing tests still pass (2 pre-existing
SDKInitTestsfailures on the integration branch are unrelated — same 2 that fail onfeat/iv-foundation-01/feat/iv-oprepo-gating-02).New
IdentityVerificationServiceTests(5):start subscribes to ConfigModelStore.HYDRATE with REQUIRED purges anon ops then wakes the queue (after awaitInitialized)— usescoVerifyOrderto assert the purge runs afterawaitInitialized(), which is the whole point of the race fix.HYDRATE with NOT_REQUIRED wakes the queue but does NOT purge.HYDRATE with UNKNOWN wakes the queue (no purge).non-HYDRATE model replacement is ignored.New
OperationRepoTests(6):useIdentityVerificationtoNOT_REQUIREDand callingforceExecuteOperationsunblocks the queue.removeOperationsWithoutExternalIdfilters the queue AND removes the persisted ops from the store.setJwtInvalidatedHandlerround-trip via internal accessor.FAIL_UNAUTHORIZEDwith IV active: invalidates JWT, fires handler, re-queues ops (not dropped), wakesenqueueAndWaitcaller withfalse.FAIL_UNAUTHORIZEDwith IV inactive: handler NOT fired, default drop-on-fail path runs (op removed from store).Each IV test scopes its
IdentityVerificationGatesmutations intry/finallyto reset the singleton;Mocks()setsconfigModel.useIdentityVerification = JwtRequirement.NOT_REQUIREDby default so the pre-HYDRATE deferral doesn't hold up tests that aren't exercising IV state.Manual testing
Not applicable for this PR — no runtime behavior change for Phase 1 users (feature flag off,
jwt_requiredeventuallyfalse) once HYDRATE completes; IV-specific behavior is gated off pending the rest of the rollout series. The pre-HYDRATE deferral is covered by unit tests.Affected code checklist
Checklist
Overview
Testing
SDKInitTestsfailures are unrelated — same 2 on integration branch)Final pass