Skip to content

fix: gate managed zccache sidecar install and surface blocking status to clients #601

@zackees

Description

@zackees

Context

The managed zccache sidecar is installed lazily from the build path:

  • crates/fbuild-build/src/zccache.rs resolves zccache through find_zccache().
  • find_zccache() uses a process-local OnceLock<Option<PathBuf>>, so within a single daemon process only one thread runs managed_zccache::ensure() and other same-process callers block on that initialization.
  • crates/fbuild-build/src/managed_zccache.rs::ensure() checks for the final binary, then downloads SHA256SUMS plus the release archive, verifies it, extracts into a unique staging dir, and finally renames into ~/.fbuild/<mode>/bin/zccache-<version>/.
  • The final rename handles a lost race: if another installer already placed the final binary, the loser deletes its staging dir and succeeds.

That final atomic rename prevents a half-installed zccache from being observed, but it does not prevent duplicate work before the rename. Multiple fbuild daemons/processes that hit first-run zccache setup at the same time can each download and extract the same archive. The daemon also does not expose a distinct "managed zccache download/install in progress" state; blocked clients just see a generic build/install wait.

Current daemon behavior:

  • POST /api/build and POST /api/install-deps serialize per project through ctx.project_lock(project_dir).
  • Streaming builds return the NDJSON response immediately and can wait up to the CLI's 1800s request timeout.
  • Non-streaming operations also use the generic 1800s daemon POST timeout.
  • /ws/status and daemon status snapshots expose operation_in_progress and current_operation, but not a structured dependency-install phase or wait reason.
  • OperationGuard updates in-memory daemon state but does not currently broadcast a status update when operation text changes.

The desired behavior is that the daemon owns first-run sidecar install coordination: one installer performs the download, other clients block intentionally, and blocked clients receive a clear status/progress signal instead of appearing hung or timing out.

Proposal

Add a daemon/process-shared managed dependency install gate for the zccache sidecar.

Suggested shape:

  1. Add a cross-process lock around managed zccache installation before network fetch begins, not only at final rename.
    • Lock scope should be the final managed version dir or a sibling lock file under ~/.fbuild/<mode>/bin/.
    • Within the lock, re-check managed_zccache_exe().is_file() before downloading.
    • Other processes should wait on the lock instead of starting duplicate downloads.
  2. Add daemon-visible install state for managed sidecar setup.
    • At minimum: dependency_install_in_progress, dependency_name = "zccache", dependency_version, phase (waiting_for_lock, downloading, verifying, extracting, installed), and a human message.
    • This should update /health//api/daemon/info or a dedicated status endpoint, and /ws/status should broadcast phase changes.
  3. Ensure clients do not time out while intentionally blocked behind a sidecar install.
    • Streaming clients should receive periodic NDJSON/status events while waiting.
    • Non-streaming clients should either use the existing 1800s timeout safely or get a clear long-running operation response contract if the wait exceeds a threshold.
    • The status should distinguish "waiting for another fbuild process to install zccache" from "compiling".
  4. Keep failure behavior explicit.
    • If the installer fails, waiters should observe the failure and either retry once or fall back to ambient zccache discovery according to the existing find_zccache() policy.
    • Do not leave stale lock state that blocks future clients forever.

Acceptance criteria

  • Two concurrent fbuild clients starting with no managed zccache installed result in exactly one network download/extract attempt for the managed zccache version.
  • Other clients block on the daemon/process-shared install gate and do not start duplicate downloads.
  • While blocked, clients can observe a clear status such as "waiting for managed zccache 1.12.7 install" rather than only "building".
  • Streaming build clients receive periodic status/log events during long sidecar install waits so the connection stays alive and looks intentional.
  • Non-streaming build/install-deps clients do not hit a request timeout solely because another client is installing zccache.
  • A failed or interrupted installer does not leave a permanent stale lock; a later fbuild invocation can recover.
  • Tests cover same-process contention, cross-process/file-lock contention, and stale/interrupted install recovery.

Open questions

  • Should this gate live in fbuild-build::managed_zccache as a generic sidecar-install lock, or in fbuild-daemon so it can update daemon status directly?
  • Should zccache install status be represented as a new daemon state, or as structured fields alongside the existing Building state?
  • Should the same mechanism become a generic managed-dependency install gate for future sidecars, not just zccache?

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions