fix: make e2e tests pass reliably locally with Docker Desktop#13741
fix: make e2e tests pass reliably locally with Docker Desktop#13741
Conversation
- Fix stale image/container reuse across test runs - Add registry readiness check and async removal polling - Skip multi-arch test when docker driver supports it - Use t.Cleanup for reliable teardown, fix project name mismatches - Re-enable 4 previously skipped tests that now pass Signed-off-by: Guillaume Lours <glours@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves local reliability of the Go E2E test suite (especially on Docker Desktop) by reducing stale resource reuse between runs, adding readiness/cleanup polling, and adjusting a few tests to be more deterministic across environments.
Changes:
- Increase polling timeouts and add explicit readiness checks to avoid flakiness on slower Docker Desktop startups.
- Make teardown more reliable (e.g.,
t.Cleanup, explicitdown, and polling for async--rmremoval). - Re-enable previously skipped E2E tests and adjust test commands/project naming to avoid mismatches and reuse.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/utils/safebuffer.go | Increase RequireEventuallyContains poll timeout to accommodate Docker Desktop startup latency. |
| pkg/e2e/watch_test.go | Ensure watched services are built and add explicit poll timeouts for watch/rebuild waits. |
| pkg/e2e/publish_test.go | Add registry readiness polling prior to publish operations. |
| pkg/e2e/ps_test.go | Add pre-test cleanup to avoid stale state from prior failed runs. |
| pkg/e2e/pause_test.go | Re-enable the pause E2E test by removing the CI skip. |
| pkg/e2e/networks_test.go | Re-enable TestNetworkConfigChanged. |
| pkg/e2e/model_test.go | Conditionally skip based on docker-model plugin availability; align project usage and ensure --rm. |
| pkg/e2e/env_file_test.go | Fix project name consistency and ensure run uses --rm and explicit -p. |
| pkg/e2e/compose_test.go | Use t.Cleanup for teardown to improve reliability. |
| pkg/e2e/compose_run_test.go | Poll for async container auto-removal after --rm to avoid timing flakes. |
| pkg/e2e/compose_run_build_once_test.go | Use t.Cleanup-based teardown and avoid pre-run cleanup with random project names. |
| pkg/e2e/build_test.go | Skip the “docker driver lacks multi-arch support” assertion when buildx reports multi-platform support under the docker driver. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Wait for registry to be ready | ||
| registryURL := "http://" + registry + "/v2/" | ||
| poll.WaitOn(t, func(l poll.LogT) poll.Result { | ||
| resp, err := http.Get(registryURL) //nolint:gosec,noctx | ||
| if err != nil { | ||
| return poll.Continue("registry not ready: %v", err) | ||
| } | ||
| _ = resp.Body.Close() | ||
| if resp.StatusCode < 500 { | ||
| return poll.Success() | ||
| } | ||
| return poll.Continue("registry not ready, status %d", resp.StatusCode) | ||
| }, poll.WithTimeout(10*time.Second), poll.WithDelay(100*time.Millisecond)) |
There was a problem hiding this comment.
The registry readiness poll uses http.Get without any request timeout. If the TCP connection succeeds but the server never responds (or a proxy/network issue stalls reads), this can hang indefinitely and bypass the poll timeout. Use a http.Client with a short Timeout or use the existing HTTPGetWithRetry helper (pkg/e2e/framework.go) so each attempt is bounded by a per-request timeout and the overall poll timeout remains effective.
What I did
Related issue
N/A
(not mandatory) A picture of a cute animal, if possible in relation to what you did