fix(store): omit empty statuses from findOldestCreatedAt (false lag-gauge reading)#58
Merged
Conversation
…auge reading)
PostgresOutboxStore and MysqlOutboxStore pre-seeded the result map with
clock.instant() for every requested status before querying, then let the SQL
overwrite only the statuses that have rows. A status with no entries therefore
came back with a near-now timestamp instead of being absent.
The lag gauge (okapi-micrometer) relies on absence to report 0 for empty
statuses: `snapshot.oldest[status]?.let { Duration.between(it, now) } ?: 0.0`.
With the pre-seed, an idle status reported a small false lag (the elapsed time
between the store query and the gauge publish) instead of 0.
The unit test never caught this: it uses a stub whose findOldestCreatedAt does
`filterKeys { it in statuses }` (correctly omitting absent statuses), so the
stub and the real adapters disagreed. Fixed by starting from an empty map so
absent statuses are omitted and the gauge's `?: 0.0` fallback applies.
Added a shared OutboxStore contract test (runs against real Postgres AND MySQL
via Testcontainers) asserting that findOldestCreatedAt returns a key only for
statuses that actually have rows.
Note: `clock` is now unused by both stores. Removing it is a breaking
constructor change touching ~20 call sites (autoconfig, benchmarks, tests) and
is left to a separate follow-up to keep this fix focused.
The absent-means-no-backlog contract was implicit and was the root of the lag-gauge bug just fixed. Stating it on the interface inoculates against a future implementer re-seeding defaults and silently reintroducing it. (Code review follow-up.)
Refines the previous KDoc: spells out that the map is status -> createdAt of that status's oldest entry (min createdAt), not just 'oldest createdAt per status'. Keeps the absent-means-no-backlog contract note.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug
PostgresOutboxStoreandMysqlOutboxStoreboth pre-seeded the result map withclock.instant()for every requested status before running the query:A status with no rows stayed at the seeded near-now timestamp instead of being absent from the map.
Why it matters
The lag gauge in
okapi-micrometerrelies on absence to report 0 for empty statuses:With the pre-seed, an idle status (e.g. no PENDING entries) reported a small false lag — the elapsed time between the store query and the gauge publish — instead of
0. A dashboard would show a non-zero lag for a status that has nothing queued.Why no test caught it
The existing unit test uses a stub whose
findOldestCreatedAtdoesoldest.filterKeys { it in statuses }— i.e. it correctly omits absent statuses. So the stub and the real JDBC adapters disagreed: the test was green while production diverged. This is a stub-more-correct-than-impl trap.The fix
Start from an empty map so absent statuses are omitted; the gauge's
?: 0.0fallback then applies.Applied identically to both adapters (the bug was copy-pasted from one to the other).
Test
Added one test to the shared
OutboxStoreContractTests— so it runs against real Postgres and real MySQL (Testcontainers), not a stub. It persists PENDING entries only, then assertsfindOldestCreatedAtreturns a key only for statuses that have rows (PENDING present with the oldestcreated_at; DELIVERED/FAILED absent). The assertion is on the key set, which catches the structural bug even under a fixed test clock (where a value-only assertion would deceptively pass).Test plan
./gradlew :okapi-postgres:compileKotlin :okapi-mysql:compileKotlin— clean (no warnings):okapi-integration-tests:compileTestKotlin— new test compiles./gradlew ktlintCheck— green./gradlew :okapi-integration-tests:testwith Docker up before merging.Note — deferred cleanup
clockis now unused by both stores (it existed only for the buggy pre-seed; Kotlin emits no warning for an unused constructorval). Removing it is a breaking constructor change touching ~20 call sites (Spring autoconfig bean factories, benchmarks, ~15 tests) and is intentionally left to a separate follow-up to keep this bugfix focused and easy to review/cherry-pick.