Skip to content

Fix prepare cache eviction: skip defensive copy for completed futures#892

Merged
nikagra merged 1 commit into
scylladb:scylla-4.xfrom
nikagra:fix/prepare-cache-defensive-copy
May 18, 2026
Merged

Fix prepare cache eviction: skip defensive copy for completed futures#892
nikagra merged 1 commit into
scylladb:scylla-4.xfrom
nikagra:fix/prepare-cache-defensive-copy

Conversation

@nikagra
Copy link
Copy Markdown

@nikagra nikagra commented May 18, 2026

Summary

Fixes excessive PREPARE requests caused by premature weak-value cache eviction in CqlPrepareAsyncProcessor.

Problem

The prepare cache uses CacheBuilder.newBuilder().weakValues() to allow GC to reclaim entries when no longer referenced. However, process() unconditionally returned a defensive copy (result.thenApply(x -> x)) instead of the cached CompletableFuture itself. This meant:

  1. The caller's only reference was to the copy, not the cached entry
  2. The cached CF had no strong references after process() returned
  3. Under any GC pressure, the weak-value entry was evicted
  4. Next session.prepare() call triggered a fresh PREPARE to all nodes (with prepare-on-all-nodes=true)

In production with high throughput (23K+ ops/s), frequent young GC caused ~62% of prepare calls to be cache misses, generating massive unnecessary network traffic.

Fix

When the cached future is already isDone(), return it directly. Completed futures are immutable — cancel(), complete(), and completeExceptionally() are all no-ops — so the defensive copy provides no protection value. Returning the cached instance directly ensures the caller holds a strong reference to the cache value, preventing GC eviction.

The defensive copy is retained for in-flight futures (not yet completed), where it still serves its original purpose: protecting the shared cache entry from cancellation by one of multiple concurrent waiters.

Testing

Added CqlPrepareAsyncProcessorTest with 4 tests:

  • should_return_cached_future_directly_when_already_completed — verifies identity (isSameAs) for completed futures
  • should_return_defensive_copy_when_future_is_in_flight — verifies copy for incomplete futures + cancellation isolation
  • should_keep_cache_entry_alive_when_caller_holds_completed_future — GC test proving the fix prevents eviction
  • should_allow_gc_eviction_when_no_strong_references_remain — confirms weak-value semantics still work when appropriate

All existing tests pass (including CqlPrepareHandlerTest).

Related

  • CUSTOMER-372: LWT write timeouts with repeated PREPARE statements
  • Follow-up PR will add a strong back-reference from PreparedStatement to the cached CF for complete liveness guarantee

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes prepare-cache weak-value eviction in CqlPrepareAsyncProcessor by avoiding unnecessary defensive copies for already-completed cached futures, which previously could lead to frequent cache misses and excessive PREPARE traffic under GC pressure.

Changes:

  • Return the cached CompletableFuture directly when it is already isDone(), retaining defensive copies only for in-flight futures.
  • Add unit tests covering completed vs in-flight behavior and (attempted) weak-value GC retention/eviction scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
core/src/main/java/com/datastax/oss/driver/internal/core/cql/CqlPrepareAsyncProcessor.java Return cached future directly when completed; keep defensive copy for in-flight to preserve cancellation isolation.
core/src/test/java/com/datastax/oss/driver/internal/core/cql/CqlPrepareAsyncProcessorTest.java Adds tests for completed/in-flight return behavior and weak-value cache retention/eviction under GC.
Comments suppressed due to low confidence (1)

core/src/test/java/com/datastax/oss/driver/internal/core/cql/CqlPrepareAsyncProcessorTest.java:156

  • The intent described in this test doesn’t match what’s asserted: it loops trying to observe eviction, but the only assertion is cache.size() >= 0, which is always true and won’t fail even if eviction never happens. Either assert a meaningful post-condition (with an explicit, bounded wait) or remove/repurpose the test to avoid giving a false sense of coverage.
    // Force GC - weak value should be collected
    for (int i = 0; i < 10; i++) {
      System.gc();
      Thread.sleep(50);
      cache.cleanUp();
      if (cache.getIfPresent(request) == null) {
        break;
      }
    }

    // Cache entry may have been evicted (weak values)
    // This is expected behavior - the fix ensures callers who DO hold a reference keep it alive
    // We just verify the cache doesn't throw
    assertThat(cache.size()).isGreaterThanOrEqualTo(0);
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good finding, please fix wording and merge

When the cached CompletableFuture is already done, return it directly
instead of creating a defensive copy via thenApply(x -> x). Completed
futures are immutable (cancel/complete are no-ops), so the copy only
served to release the caller's strong reference to the cached value,
causing premature weak-value eviction under GC pressure.

This led to repeated PREPARE requests being sent to all nodes on every
execution, as the cache entry would be garbage-collected between calls.
With prepare-on-all-nodes=true (default), each eviction multiplied the
re-prepare cost by the cluster node count.

The defensive copy is still used for in-flight futures to protect the
shared cache entry from cancellation by concurrent waiters.

Ref: CUSTOMER-372
@nikagra nikagra force-pushed the fix/prepare-cache-defensive-copy branch from 2fd05f7 to 482d068 Compare May 18, 2026 20:47
@nikagra nikagra merged commit c777983 into scylladb:scylla-4.x May 18, 2026
20 checks passed
@nikagra nikagra deleted the fix/prepare-cache-defensive-copy branch May 18, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants