Use CLOCK (second-chance) eviction in the UFFD page cache#274
Use CLOCK (second-chance) eviction in the UFFD page cache#274sjmiller609 wants to merge 5 commits into
Conversation
Self-contained benchmark package comparing eviction policies for the pager's shared page cache under a workload that models many VMs restoring from a few shared backing snapshots (skewed popularity, a shared hot core per snapshot, and a one-shot scan polluter). Policies: prod (wraps the shipping cache), fifo (current behavior), lru, clock (second-chance), tinylfu (W-TinyLFU-lite), and ristretto. Read-path locking reflects each policy's real requirement so concurrency is measured faithfully. Includes a hit-rate comparison, a hot-shared-key read benchmark across shard counts and parallelism, and a p99-latency-under-concurrency test. Not wired into the pager; experiment only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Models the observed production shape: large per-snapshot working set (cache holds only ~5 snapshots) with many distinct snapshots, so the cache is heavily oversubscribed. Adds a Bursty config to compare sequential per-snapshot fan-out against concurrent fan-out across snapshots — the temporal pattern that flips the recency-vs-frequency tradeoff. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Models a stable always-hot snapshot set reused across the whole timeline alongside customer-specific snapshots that fan out in waves then go idle. Reports hit rate split by hot vs tail so we can see which policy keeps the hot set resident while customers churn. Adds explicit per-snapshot fork counts and a hot/tail schedule to the generator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Borrow now marks an atomic reference bit under the read lock instead of leaving recency untracked, and eviction sweeps from the back giving referenced entries one reprieve before reclaiming the first unreferenced one. This keeps continuously-faulted snapshot pages resident while idle snapshots fall out, without taking a write lock on the read path, so concurrent faults on hot pages don't serialize. Bumps the pager VERSION since a runtime file changed.
Drops the throwaway lib/uffdpager/cachebench benchmark package and the ristretto dependency it pulled in, now that CLOCK is implemented directly.
|
Created a monitoring plan for this PR. What this PR does: Replaces the UFFD page cache eviction policy in hypeman from LRU to CLOCK (second-chance), reducing write-lock contention on the cache hot path under concurrent page faults. Intended effect:
Risks:
Status updates will be posted automatically on this PR as monitoring progresses. |
|
Created a monitoring plan for this PR. What this PR does: Improves instance spawn throughput under concurrent workloads by switching hypeman's internal page cache from LRU to CLOCK (second-chance) eviction, reducing lock contention on the cache hit path. Intended effect:
Risks:
Status updates will be posted automatically on this PR as monitoring progresses. |
rgarcia
left a comment
There was a problem hiding this comment.
reviewed the CLOCK switch end to end — read-path change is clean and the version bump is correct. one thing i'd like confirmed (a saturation edge case) plus a couple of test-coverage gaps. nothing here blocks; leaning comment-not-changes since the main item is "is this intended" rather than a definite defect.
question — does a just-faulted page get evicted on insert under a saturated hot shard?
Add pushes new entries at the front with ref=false (lib/uffdpager/cache.go:128), but the eviction sweep also moves spared (referenced) entries to the front (lib/uffdpager/cache.go:172) — ahead of the entry just pushed. if every resident page in the shard is referenced, the sweep relocates all of them in front of the new page, the new page ends up at the back, and (bit still false) gets evicted on that same Add. trace, 2-page shard, p1/p2 both hot:
Add(p3): ring [p3(0), p1(1), p2(1)] bytes 3 > 2
back=p2(1) → clear, MoveToFront → [p2(0), p3(0), p1(1)]
back=p1(1) → clear, MoveToFront → [p1(0), p2(0), p3(0)]
back=p3(0) → EVICT p3
so the page just read from backing (lib/uffdpager/server_faults_linux.go:197) is dropped, and the next fault for it misses → re-reads from disk → re-adds → evicted again. it only triggers when the shard is fully referenced (if any resident entry is unreferenced, that one is the victim and the new page survives) — but the "continuously-faulted hot set that saturates a shard" workload this PR targets is exactly when that can happen, and a newly-hot page can't get a foothold.
conventional CLOCK fix is to insert with the bit set (PushFront then ref.Store(true), mirroring the update path) so a new page gets one grace sweep. costs one extra sweep before a genuine one-off ages out. your call — but if the current behavior is intentional it's worth a comment, because it reads as accidental.
question (design) — sampled LRU considered?
agree the list-LRU read-side write lock is genuinely bad here — shared-snapshot restores fault identical keys concurrently, so they'd all serialize on one shard lock — so CLOCK is a reasonable pick. did you weigh redis-style sampled LRU (atomic last-access counter set under the read lock, sample-K-and-evict-oldest)? closer to true-LRU semantics and also avoids the read-side write lock. not advocating a rewrite, just curious whether it was on the table.
nit
lib/uffdpager/cache.go:123vs:128— the ref-bit asymmetry (update path setstrue, insert leavesfalse) is the root of the question above and isn't documented anywhere. one line on thePushFrontwould save the next reader.
test coverage
- the
-raceclaim in the PR body rests on concurrent readers, but all three cache tests are single-goroutine, so-racenever exercises a concurrentref.Store. a small goroutine fan-out hammeringBorrowon one key would make that claim real — the atomic onrefexists specifically for that path. TestPageCacheSecondChanceEvictsUnreferencedcovers "one spared + one unreferenced victim" but not the fully-referenced sweep (all bits set → full clear pass → eviction), which is both the degenerate CLOCK path and the one in the question above. worth a test pinning whatever behavior you settle on.
otherwise lgtm — lookup collapsing to a single RLock path is a nice simplification, and the VERSION bump is required (it's go:embed'd and gates the versioned systemd unit), not cosmetic.
Summary
Switches the pager's shared page cache (
lib/uffdpager/cache.go) from its current insertion-order (FIFO-like) eviction to CLOCK / second-chance, which approximates LRU without taking a write lock on the read path.Borrow(the fault hot path) andGetset it under the read lock — no list reordering — so concurrent faults on the same hot pages don't serialize behind a write lock.Why
Continuously-faulted snapshot pages (the always-hot snapshot set) keep getting their reference bit re-set, so they stay resident, while idle/one-off snapshots age out — the retention behavior we want as the cache oversubscribes across many backing snapshots. The current FIFO behavior ignores reuse and can evict a hot page that's actively being faulted; true LRU would fix that but only by reordering a list under an exclusive lock, which would serialize simultaneous restores. CLOCK gets the recency benefit at the read-lock cost.
Bumps
lib/uffdpager/VERSIONsince a pager runtime file changed.Test plan
go test ./lib/uffdpager/passes-raceclean on the cache tests (the reference bit is set by concurrent readers under RLock)Note
Medium Risk
Changes concurrent cache behavior on the UFFD fault hot path (
Borrow); incorrect CLOCK logic could evict hot pages or hurt restore latency under memory pressure.Overview
Replaces LRU list reordering in the UFFD page cache with CLOCK (second-chance) eviction, so hot snapshot pages are retained without taking a write lock on the fault path.
Each entry gets an atomic reference bit set on Get/Borrow hits under the shard read lock; eviction walks the ring from the back, gives referenced entries one pass (clears the bit and moves them forward), and drops the first unreferenced page until the shard is under budget. Add no longer moves entries to the front on update—it only sets the ref bit. lib/uffdpager/VERSION is bumped to 0.1.3; the eviction test now asserts referenced pages survive over untouched ones.
Reviewed by Cursor Bugbot for commit ef74e39. Bugbot is set up for automated code reviews on this repo. Configure here.