fix(payout): harden BTC amount serialization and add retry circuit-breaker#3789
Open
TaprootFreak wants to merge 3 commits into
Open
fix(payout): harden BTC amount serialization and add retry circuit-breaker#3789TaprootFreak wants to merge 3 commits into
TaprootFreak wants to merge 3 commits into
Conversation
…eaker PR #3729 fixed only the fee_rate field. The amounts field in send/sendmany RPC is parsed with ParseFixedPoint(decimals=8) and silently rejects values with more precision (e.g. 0.30000000000000004 from JS float arithmetic) via "Invalid amount" (error code -3). On 2026-05-29 two BTC BuyCrypto payouts stalled for 43 min and 8 min in stochastic retry loops before random floating-point rolls let them through (Orders 106888, 106891). This change applies three complementary hardenings: * Quantize every payout amount to 8 decimals in PayoutBitcoinService.sendUtxoToMany before the RPC call and reject NaN/Infinity/non-positive amounts with a structured InvalidPayoutAmountException, so a bad input fails fast with a clear error instead of being lost as an opaque "Invalid amount" inside Bitcoin Core. fee_rate is re-rounded to 3 decimals as defense-in-depth. * Persist a retry counter, last error and last-attempt timestamp on payout_order (new migration AddPayoutOrderRetryTracking) so operators can see which orders are chronically failing instead of inferring it from log pattern matching. * Fire an operator alert via the existing ErrorMonitoringMail pattern as soon as a payout group has failed 5+ times (~2.5 min of silent retry). suppressRecurring + 1h debounce keep the inbox quiet on long incidents. Tracking + alert live in BitcoinBasedStrategy.send() so they cover every UTXO-derived chain that inherits the base; the amount sanitization is deliberately scoped to BTC.
* notification options: drop suppressRecurring, keep debounce 1h — the entity short-circuits `suppressRecurring || isDebounced` so combining both would silence every retry forever instead of giving 1 alert/group/hour * correlationId sorts ids before joining — Postgres findBy has no ORDER BY, so an unordered group would generate a fresh correlationId each retry and defeat the debounce * min → max threshold semantic — a fresh order joining a stuck group should not silence the alert; the underlying RPC error fails the whole sendmany call, so any single order over threshold is signal worth surfacing * guard empty orders array — Math.min/max on [] yields ±Infinity which would otherwise crash the alert path on orders[0].asset.name * tighten the quantization test — assert the post-round value is observably distinct from the un-rounded raw and lies on the 8-decimal grid * InvalidPayoutAmountException sets this.name so logs identify the class * rename recordPayoutFailure parameter `error: string` → `message: string` to match what it actually receives * move trackPayoutFailure + sendRecurringPayoutFailureAlert under the existing //*** HELPER METHODS ***// section header for consistency
…(not suppressRecurring)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Today (2026-05-29) two BTC
BuyCryptopayouts stalled for 43 min (Order 106888 / BuyCrypto 123873) and 8 min (Order 106891 / BuyCrypto 123874) in silent retry loops before random JS floating-point rolls happened to produce values Bitcoin Core'sParseFixedPointaccepted.PR #3729 (merged 2026-05-29 15:59) fixed the
fee_ratefield — the immediate trigger. But three systemic gaps remain:amountsare still unsanitized. Bitcoin Core parses amount fields withParseFixedPoint(decimals=8). Float artifacts like0.1 + 0.2 = 0.30000000000000004(after aggregation or fee-adjustments downstream ofBitcoinBasedStrategy.aggregatePayout) trigger the sameInvalid amount(error code -3) at a different field.This PR fixes all three.
Changes
1. Amount sanitization + structured validation (BTC only)
PayoutBitcoinService.sendUtxoToManynow quantizes every payout amount to 8 decimals withUtil.round(x, 8)before the RPC call.fee_rateis re-rounded to 3 decimals as defense-in-depth — independent of fix(bitcoin): round send fee_rate to 3 decimals #3729, so a future regression on the fee service can't bring the bug back.InvalidPayoutAmountException: NaN, Infinity, zero or negative amounts now fail fast at the boundary with a clear error message instead of being lost as an opaqueInvalid amountfrom the node.2. Persistent retry tracking (all UTXO chains)
AddPayoutOrderRetryTrackingadds three columns topayout_order:retryCount(int, default 0),lastError(varchar 2048),lastAttemptDate(timestamp).PayoutOrder.recordPayoutFailure(error)/resetPayoutRetry()helpers manage the lifecycle.BitcoinBasedStrategy.send()increments the counter on RPC failure and resets it on successful broadcast.3. Operator alert via existing
ErrorMonitoringMailpatternBitcoinBasedStrategyfires aMailType.ERROR_MONITORINGwithMailContext.PAYOUTandisLiqMail: truetoliq@dfx.swiss.suppressRecurring: true+debounce: 3600000cap the inbox at 1 alert per order-group per hour during long incidents.sendNonRecoverableErrorMail, so no new infrastructure needed.Scope decisions
PayoutBitcoinService— Firo and other UTXO chains inheritBitcoinRpcClientbut have potentially different fee/amount conventions, so the RPC wrapper itself stays untouched.BitcoinBasedStrategy(base) — semantics-free, every UTXO chain that inherits gets the same protection.Verification
Local CI parity:
npm run lint— cleannpm run format:check— cleannpm run build— passnpx jest --testPathPattern='bitcoin-fee|payout'— 109/109 tests green, 7 suitespayout-bitcoin.service.spec.ts, 5 inpayout-bitcoin-based.strategy.spec.ts)Post-deploy on DEV:
liq@dfx.swissafter 5 retries.SELECT id, retryCount, lastError, lastAttemptDate FROM payout_order ORDER BY id DESC LIMIT 10— counter visible.Related