fix(db): avoid long flush stall on restart by EddieHouston · Pull Request #211 · Blockstream/electrs

EddieHouston · 2026-04-23T12:03:52Z

Summary

Fix freeze that can occur on restart of a mature electrs DB built from 91b883a or later.

enable_auto_compaction() was tightening level0_stop_writes_trigger from the bulk-load value (512) to the RocksDB default (36) on every update() call, not just once. DB::open applies bulk-load triggers at startup, so after any restart L0 can legitimately hold more than 36 files. The reset instantly put the DB into pre-flush stall territory, and the end-of-batch db.flush() one call later parked inside WaitUntilFlushWouldNotStallWrites until background compaction brought L0 below 36.

In testnet this could take over 1 hour of indexer freeze on a restart.

Fix

Split enable_auto_compaction() into:

enable_auto_compaction() — minimal flag flip (disable_auto_compactions=false), safe to call on every update().
apply_steady_state_triggers() — holds the L0 trigger and pending-bytes-limit resets. Documented as unsafe to call while L0 is populated.

start_auto_compactions() in schema.rs now calls apply_steady_state_triggers() inside the F sentinel gate, immediately after full_compaction() drains L0. So the tight triggers apply exactly once per DB lifetime, and only in a DB
state that won't stall under them.

On restarts where F is already set, triggers stay at bulk-load values — the comment in DB::open already argues that configuration is fine for steady-state reads given the prefix bloom filters added in 1e7da26 and 2c33745.

Test plan

cargo check clean
cargo test --lib — 8/8 pass, including new_index::db::tests::*
cargo test --test electrum — 4/4 pass
cargo test --test rest — 22/22 pass
Deploy to testnet and verify a restart when there are more than 36 L0 files completes the first flushing history_db to disk in under a second
Confirm Manual flush start → Manual flush finished timestamps in the RocksDB LOG are within a few ms of each other on the restart
Verify synced tip resumes advancing from ZMQ notifications immediately after restart

enable_auto_compaction() was lowering level0_stop_writes_trigger from the bulk-load value (512) to the RocksDB default (36) on every update() call. At DB open the bulk-load triggers are applied by DB::open, so on any restart L0 can legitimately hold more than 36 files. When the first post-restart update() called enable_auto_compaction(), the trigger tightening instantly put the DB into pre-flush stall territory, and the end-of-batch db.flush() that follows parked inside WaitUntilFlushWouldNotStallWrites waiting for background compaction to bring L0 below 36. On production testnet this reliably cost 77 minutes of indexer freeze per restart (verified by 'Manual flush start' → 'Manual flush finished' in the RocksDB LOG). The actual memtable flush took 62 ms once unblocked; the rest was wait. Split enable_auto_compaction() into the minimal flag-flip and a new apply_steady_state_triggers() that holds the L0 trigger / pending-bytes- limit reset. Invoke the latter exactly once per DB lifetime, inside the F-sentinel gate in start_auto_compactions(), immediately after full_compaction() has drained L0. On DBs where F is already set (steady- state restart), triggers stay at bulk-load values — the comment in DB::open already argues that configuration is fine for steady-state reads given the prefix bloom filters.

EddieHouston requested review from DeviaVir, Randy808 and philippem April 23, 2026 12:03

EddieHouston self-assigned this Apr 23, 2026

EddieHouston force-pushed the fix/restart-flush-stall branch from d102699 to bef02e3 Compare April 23, 2026 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(db): avoid long flush stall on restart#211

fix(db): avoid long flush stall on restart#211
EddieHouston wants to merge 1 commit intoBlockstream:new-indexfrom
EddieHouston:fix/restart-flush-stall

EddieHouston commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EddieHouston commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EddieHouston commented Apr 23, 2026 •

edited

Loading