fix: backfill historical keyper sets on startup#712
Conversation
The keyperset syncer only fetched the currently-active keyper set and any future ones, silently skipping all sets that had been active before the keyper started. On a fresh install or after extended downtime, missed keyper sets were never stored. On restart, most keyper sets are already in the DB. The syncer now reads existing indices at startup and only fetches the gaps, avoiding redundant contract calls.
|
Fixes #713 |
| if err != nil { | ||
| return nil, err | ||
| var initialKeyperSets []*event.KeyperSet | ||
| for _, r := range complementRanges(s.KnownRanges, numKS) { |
There was a problem hiding this comment.
Could we perhaps avoid complementRanges and just loop over the contract indices? It would be simpler to read and understand, and would reduce the amount of logic and test coverage needed here.
Since numKS is the contract count, we can do for i := uint64(0); i < numKS; i++, skip indices already present in the DB, and fetch the rest. That also guarantees we never fetch outside [0, numKS), which can happen with the current logic if the known DB ranges are ahead of the contract snapshot.
blockchainluffy
left a comment
There was a problem hiding this comment.
lgtm overall, just concerned about stuck message risk for new joiners:
when a new keyper joins at index N, handleOnChainKeyperSetChanges queues NewBatchConfig submissions for indices 1..N. Shuttermint rejects them with not allowed to vote (app/app.go:407). These messages retry forever — isRetrieable is a no-op (fx/send.go:52) — and since GetNextShutterMessage is FIFO and SendShutterMessages bails on first error, the stuck messages block all subsequent outgoing messages, including the joiner's own eon-N check-in and DKG protocol messages.
Retries only stop once smobserver syncs the shuttermint history far enough to hit DeleteShutterMessageByDesc (smstate.go:280) for each historical BatchConfig event.
The keyperset syncer only fetched the currently-active keyper set and any future ones, silently skipping all sets that had been active before the keyper started. On a fresh install or after extended downtime, missed keyper sets were never stored.
On restart, most keyper sets are already in the DB. The syncer now reads existing indices at startup and only fetches the gaps, avoiding redundant contract calls.