Skip to content

feat(pegs): L3 oracle-events consumer — stables/oracle_events.py (#302)#307

Draft
spalen0 wants to merge 4 commits into
feat/peg-monitoring-l2-oraclesfrom
feat/peg-monitoring-l3-events
Draft

feat(pegs): L3 oracle-events consumer — stables/oracle_events.py (#302)#307
spalen0 wants to merge 4 commits into
feat/peg-monitoring-l2-oraclesfrom
feat/peg-monitoring-l3-events

Conversation

@spalen0

@spalen0 spalen0 commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Closes #302. Layer 3 (consumer side) of peg monitoring.

⚠️ Stacked on #306 (L2). Base is feat/peg-monitoring-l2-oracles; the diff below is L3-only. The chain is #305#306 → this. Each retargets to main as its parent merges.

What this is

An Envio GraphQL consumer that reads Chainlink AnswerUpdated(current, roundId, updatedAt) rows indexed by chain-events/yearn-indexing-test (issue #31) and turns per-feed anomalies into alerts. Where L2 polls the current round hourly, L3 consumes the full event stream so no round is missed between polls. Mirrors protocols/timelock/timelock_alerts.py.

Detection (pure, unit-tested)

Anomaly Rule Severity
Large round-over-round jump |Δanswer| / prev ≥ JUMP_THRESHOLD (default 10%, env-tunable) HIGH
Missed-heartbeat gap updatedAt gap between consecutive rounds > heartbeat + buffer HIGH
Sequence anomaly roundId does not strictly increase CRITICAL

Jumps are percentage-based (unit-independent), so no per-feed decimals are needed and BTC-denominated feeds aren't false-flagged the way an absolute threshold would.

Routing & dedupe

  • Alerts use PeggedAsset.protocol + channel (the L2 fix), so emergency dispatch fires for ethena/cap/infinifi.
  • De-dupe via a per-aggregator blockTimestamp cursor. A small context window is fetched before each cursor so the first new round has a prior round to diff against; alerts only fire for rounds strictly newer than the cursor → reruns never re-alert the same round (acceptance criterion). Cursor advances only when every send lands (same trade-off as timelock_alerts).

Address sourcing (the subtle part, per indexer #31)

AnswerUpdated is emitted by the underlying aggregator (which rotates on phase upgrades), and its indexed params are current/roundIdnot the address — so Envio can't scope it by feed. The consumer resolves each feed proxy's current aggregator() on-chain (batched) and maps aggregator → asset, so it always tracks the live aggregator. A phase rotation naturally surfaces as a staleness gap in L2.

GraphQL field names are centralised in _F_* constants to align with #31's final schema with a one-line change if the indexer names a field differently.

Automation

Added stables-oracle-events to the hourly profile (renders; 25 tasks). Hourly is sufficient given the cursor + context window guarantees no missed rounds; can move to the 10-min profile if you want lower latency.

Validation

  • ruff + mypy clean on the new module.
  • 651 passed, 6 skipped (full suite; +14 new L3 tests: jump/gap/sequence detection, dedup gate with prior-round context, cursor advance, row parsing, dispatch routing).
  • Live smoke: resolved all 6 underlying aggregators on-chain (e.g. cbBTC proxy 0x2665…0x5e24…), queried the real Envio endpoint, and — since chore: use cache for safe queued txs #31's entity isn't indexed yet — degraded gracefully via send_error_message.

Dependency / follow-up

🤖 Generated with Claude Code

spalen0 and others added 4 commits June 29, 2026 21:11
Layer 3 of peg monitoring: an Envio GraphQL consumer that reads Chainlink
AnswerUpdated rows and turns per-feed anomalies into alerts. Stacks on L2 (#301)
for the shared registry routing. Mirrors protocols/timelock/timelock_alerts.py.

Detects per feed (pure, unit-tested functions):
- large round-over-round jumps (|Δanswer|/prev >= JUMP_THRESHOLD, default 10%),
- missed-heartbeat gaps (updatedAt gap > feed heartbeat + buffer),
- sequence anomalies (roundId not strictly increasing -> CRITICAL).

Routing uses PeggedAsset.protocol + channel so alerts reach the owning protocol
and its emergency dispatch (consistent with the L2 fix). De-dupe via a
per-aggregator blockTimestamp cursor, advanced only when every send lands (same
trade-off as timelock_alerts: re-alert on retry is acceptable, dropping is not).

Address sourcing (per indexer issue chain-events/yearn-indexing-test#31):
AnswerUpdated is emitted by the underlying aggregator (rotates on phase upgrades),
which Envio can't scope by feed. The consumer resolves each feed proxy's current
aggregator() on-chain (batched) and maps aggregator -> asset, so it always tracks
the live aggregator. GraphQL field names are centralised in _F_* constants to
align with #31's final schema.

Wired into the hourly profile (renders; 25 tasks). ruff + mypy clean (new files);
651 passed, 6 skipped. Live smoke resolved all 6 aggregators on-chain and handled
the not-yet-indexed entity gracefully via send_error_message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#302)

detect_anomalies sorted rounds by (block_timestamp, round_id). round_id is the
field whose monotonicity we validate, so using it as the sort tiebreaker reorders
same-block events into apparent monotonicity and hides a backwards round: 102
followed by 101 at the same block_timestamp sorted to 101->102 and produced 0
alerts.

Carry blockNumber + logIndex through parse_round (and select logIndex in the
GraphQL query), and sort by OracleRound.event_order = (block_number, log_index),
the canonical on-chain emission order. block_timestamp remains the dedup cursor.

Add a regression test: a backwards round sharing a block_timestamp, distinguished
only by logIndex, now produces a CRITICAL sequence-anomaly alert.

Full suite: 652 passed, 6 skipped. ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The sequence-anomaly (roundId) and missed-heartbeat-gap (updatedAt) checks in
oracle_events.py rely on a feed reporting reliable round metadata. Honor the
same ChainlinkFeed.reports_round_metadata flag L2 uses so feeds that return
constant or zero roundId/updatedAt don't false-positive; the answer-based jump
check always runs. Also warn when an AnswerUpdated query hits its row limit
(newest rounds deferred to the next run). Adds gating tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant