Skip to content

fix: backfill historical keyper sets on startup#712

Open
jannikluhn wants to merge 2 commits into
mainfrom
fix-keyper-set-backfill
Open

fix: backfill historical keyper sets on startup#712
jannikluhn wants to merge 2 commits into
mainfrom
fix-keyper-set-backfill

Conversation

@jannikluhn
Copy link
Copy Markdown
Contributor

The keyperset syncer only fetched the currently-active keyper set and any future ones, silently skipping all sets that had been active before the keyper started. On a fresh install or after extended downtime, missed keyper sets were never stored.

On restart, most keyper sets are already in the DB. The syncer now reads existing indices at startup and only fetches the gaps, avoiding redundant contract calls.

The keyperset syncer only fetched the currently-active keyper set and
any future ones, silently skipping all sets that had been active before
the keyper started. On a fresh install or after extended downtime, missed
keyper sets were never stored.

On restart, most keyper sets are already in the DB. The syncer now reads
existing indices at startup and only fetches the gaps, avoiding redundant
contract calls.
@jannikluhn
Copy link
Copy Markdown
Contributor Author

Fixes #713

if err != nil {
return nil, err
var initialKeyperSets []*event.KeyperSet
for _, r := range complementRanges(s.KnownRanges, numKS) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perhaps avoid complementRanges and just loop over the contract indices? It would be simpler to read and understand, and would reduce the amount of logic and test coverage needed here.

Since numKS is the contract count, we can do for i := uint64(0); i < numKS; i++, skip indices already present in the DB, and fetch the rest. That also guarantees we never fetch outside [0, numKS), which can happen with the current logic if the known DB ranges are ahead of the contract snapshot.

@ylembachar ylembachar mentioned this pull request Jun 2, 2026
Copy link
Copy Markdown
Contributor

@blockchainluffy blockchainluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall, just concerned about stuck message risk for new joiners:

when a new keyper joins at index N, handleOnChainKeyperSetChanges queues NewBatchConfig submissions for indices 1..N. Shuttermint rejects them with not allowed to vote (app/app.go:407). These messages retry forever — isRetrieable is a no-op (fx/send.go:52) — and since GetNextShutterMessage is FIFO and SendShutterMessages bails on first error, the stuck messages block all subsequent outgoing messages, including the joiner's own eon-N check-in and DKG protocol messages.

Retries only stop once smobserver syncs the shuttermint history far enough to hit DeleteShutterMessageByDesc (smstate.go:280) for each historical BatchConfig event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants