Skip to content

feat(core): require TransactionRunner in OutboxScheduler/OutboxPurger (breaking)#54

Merged
endrju19 merged 2 commits into
mainfrom
kojak-68-core-require-tx-runner
Jun 5, 2026
Merged

feat(core): require TransactionRunner in OutboxScheduler/OutboxPurger (breaking)#54
endrju19 merged 2 commits into
mainfrom
kojak-68-core-require-tx-runner

Conversation

@endrju19
Copy link
Copy Markdown
Collaborator

Summary

Moves the "claim+deliver runs in a transaction" invariant down into okapi-core, where it belongs. OutboxScheduler and OutboxPurger now require a non-null TransactionRunner constructor parameter; the nullable default and the silent non-transactional fallback in tick() are gone.

Under JDBC auto-commit, FOR UPDATE SKIP LOCKED releases its row lock at the end of the claim SELECT. Concurrent processor instances then see overlapping result sets and silently deliver the same entry multiple times — no log, no exception, no metric. This PR removes that footgun from the core API. Companion to #49, which closed the same hole at the Spring Boot autoconfig layer; this PR completes the story for non-Spring consumers (Ktor, manual Spring wiring, plain Java/Kotlin).

Breaking change — pre-1.0, permitted by the project's stated versioning policy.

Changes

  • OutboxScheduler and OutboxPurger: transactionRunner: TransactionRunner? = nulltransactionRunner: TransactionRunner (required); drop the ?: fallback in tick(). @JvmOverloads kept (still useful for remaining default params config / clock).
  • KDoc on OutboxScheduler documents the FOR UPDATE SKIP LOCKED rationale — the why is non-obvious from the signature alone and would otherwise tempt a future reader to "simplify" back to nullable.
  • Test updates: 5 scheduler + 7 purger construction sites supplied with a noOpTransactionRunner() helper; deleted "tick runs without transactionRunner" (exercised the removed null-fallback); added new test "transactionRunner wraps each batch delete" on the purger — asserts the runner is invoked per batch, not just present (closes a behavioral gap the type system cannot enforce; analogous to the existing scheduler test).
  • Renamed "transactionRunner wraps tick when provided""transactionRunner wraps tick" (always provided now).
  • CHANGELOG.md: new ### Changed (BREAKING) entry under [Unreleased]. Migration example uses anonymous-object syntax, not lambda — TransactionRunner is intentionally a plain interface, not fun interface, because Kotlin forbids SAM conversion on generic abstract methods.

What this does NOT change

  • Spring Boot autoconfig — already provides a TransactionRunner after feat: require TransactionRunner in okapi-spring-boot autoconfig (KOJAK-67) #49; no further changes needed.
  • The TransactionRunner interface itself — unchanged.
  • Existing concrete impls (SpringTransactionRunner, ExposedTransactionRunner).
  • Logging / metrics for the schedulers — out of scope; one related logging improvement for OutboxPurger.tick() was identified during review and deferred to a separate follow-up.

Test plan

  • ./gradlew :okapi-core:test — green
  • ./gradlew :okapi-spring-boot:test — green (adapter still happy)
  • ./gradlew ktlintCheck — clean
  • ./gradlew build — full build green
  • Grep verified: every OutboxScheduler( / OutboxPurger( construction site in the repo now passes an explicit transactionRunner.

Closes #51.

… (breaking)

Drop the nullable default and the silent non-transactional fallback in tick().
Under JDBC auto-commit, FOR UPDATE SKIP LOCKED releases its row lock at the
end of the claim SELECT and concurrent processor instances deliver the same
entry multiple times. Make the invariant explicit at the type level in
okapi-core, complementing #49's adapter-layer fix.

Closes #51
Replace the restating "runs inside transactionRunner" line with the
purger-specific rationale (per-batch transaction boundary: bounded rollback
and no multi-minute sweep transaction) — the why the type signature can't
convey. Drop the CHANGELOG parenthetical about plain-vs-fun interface; the
Kotlin compiler already rejects fun interface on a generic abstract method,
so the note documents an absence the language enforces.
endrju19 added a commit that referenced this pull request Jun 5, 2026
… failure (#55)

## Summary

`OutboxPurger.tick()` purges DELIVERED entries in a `do { ... } while`
loop where each iteration runs an independent transaction. When
iteration N throws (transient DB error, lock timeout, etc.), iterations
0..N-1 committed in their own transactions and **their deletes are
durable** — but the previous error log emitted only the bare exception
with no count, so an operator paged on "Outbox purge failed" had no way
to tell whether 0 or 9000 entries were purged before the failure without
inspecting the database directly.

## Why this matters

- **Operations**: on-call engineer sees `"Outbox purge failed after 7
batches (700 entries purged this tick), will retry at next scheduled
interval"` instead of just `"Outbox purge failed, ..."`. Tells them at a
glance whether the failure was early (real outage) or late (transient
hiccup after most of the work was done).
- **Throughput observability**: the next tick at `config.interval` later
will re-issue the purge from a fresh cutoff. If N-1 batches succeeded,
the next tick has N-1 fewer rows to scan. Logging the count makes this
visible without DB inspection.

## What changed

- `OutboxPurger.kt` — hoist `totalDeleted` and `batches` out of the
`try` block so the `catch` handler can include them as logger arguments.
No behavioural change beyond the log message format — the broad catch is
correctly placed at the scheduler-task boundary (must stay broad to
prevent `ScheduledExecutorService` from cancelling future ticks).

- `okapi-core/build.gradle.kts` — swap
`testRuntimeOnly(libs.slf4jSimple)` for
`testImplementation(libs.logbackClassic)`. Logback is now the sole SLF4J
binding on the test classpath (no multi-binding warning) and is required
by the new `ListAppender` capture in the test. Production code stays
free of any logging backend.

- `OutboxPurgerTest.kt` — new test `"error log preserves partial batch
progress on mid-loop failure"`. Asserts on the typed argument array
(`Int batches`, `Int totalDeleted`) attached to the captured
`ILoggingEvent`, not on the formatted message text — so future
log-wording tweaks won't break the regression check.

## How this was identified

Multi-agent review of #54 (`pr-review-toolkit:silent-failure-hunter`)
flagged this as MEDIUM. #54's body explicitly defers the logging
improvement to a separate follow-up; this PR is that follow-up.

## Out of scope

- The analogous `OutboxScheduler.tick()` catch — no batch loop, no
partial-progress information to lose.
- Restructuring the broad catch (correctly placed at the scheduler task
boundary).
- Telemetry / metrics — Micrometer counters for the purger live in
`okapi-micrometer`; this PR is logging-only.

## Compatibility with #54

Branches off `main` (where `transactionRunner` is still nullable with
elvis fallback). When #54 lands, this commit rebases trivially — the
hoisted-counter pattern is structurally identical regardless of whether
the runner is nullable.

## Test plan

- [x] `./gradlew :okapi-core:test` — green
- [x] `./gradlew ktlintCheck` — green
- [x] `./gradlew build` — full build green (all tests, all modules)
- [x] Manual check: only the purger's error message string changed; the
scheduler's similar message untouched
@endrju19 endrju19 merged commit b991083 into main Jun 5, 2026
8 checks passed
@endrju19 endrju19 deleted the kojak-68-core-require-tx-runner branch June 5, 2026 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

core: require non-null TransactionRunner in OutboxScheduler / OutboxPurger (breaking)

1 participant