Skip to content

fix: retry stale GPU prewarm pods#735

Merged
bradhilton merged 1 commit into
mainfrom
codex/gpu-steady-prewarm-retry
Jun 23, 2026
Merged

fix: retry stale GPU prewarm pods#735
bradhilton merged 1 commit into
mainfrom
codex/gpu-steady-prewarm-retry

Conversation

@bradhilton

Copy link
Copy Markdown
Collaborator

Summary

  • retry steady-state GPU prewarm DaemonSet pods whose mutable latest tag resolves to a stale digest
  • keep the digest verification as the gate before the workflow smoke-launches SkyPilot

Testing

  • bash -n scripts/build-gpu-image.sh
  • scripts/build-gpu-image.sh --help
  • git diff --check

The previous main run reached the steady DaemonSet with 8/10 pods on the new digest and 2/10 pods still resolving latest to the prior digest. This patch deletes only those stale DaemonSet pods and lets Kubernetes recreate them until every refresh-tag init container resolves to the pushed digest.

@bradhilton bradhilton merged commit c476bd5 into main Jun 23, 2026
6 checks passed
@bradhilton bradhilton deleted the codex/gpu-steady-prewarm-retry branch June 23, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant