Skip to content

e2e: abstract container behind backend interface (docker + hypeman)#273

Merged
rgarcia merged 12 commits into
mainfrom
e2e-backend-interface
Jun 5, 2026
Merged

e2e: abstract container behind backend interface (docker + hypeman)#273
rgarcia merged 12 commits into
mainfrom
e2e-backend-interface

Conversation

@rgarcia
Copy link
Copy Markdown
Contributor

@rgarcia rgarcia commented Jun 4, 2026

Summary

Abstracts the e2e browser instance behind a Backend interface in server/e2e with two interchangeable implementations selected by KI_E2E_BACKEND (default docker, so existing CI is unchanged):

  • Docker backend — the original testcontainers-go logic, moved behind the interface.
  • Hypeman backend — starts the image as a remote VM via github.com/kernel/hypeman-go, reaching it through the host's wildcard ingress (hostname {instance}.<domain>, routed by listen port, TLS-terminated). api reuses the host's existing 444→10001 browser ingress; cdp 9222 / cd 9224 are find-or-created once per host (matched by rule shape across all ingresses, never per-instance). Domain is derived from the base URL (KI_E2E_HYPEMAN_INGRESS_DOMAIN overrides); KI_E2E_HYPEMAN_RAW_IP=1 falls back to the instance's private IP.

The ~24 e2e_*_test.go files keep using *TestContainer unchanged; it's now a thin facade over the selected Backend.

Config / env

  • KI_E2E_BACKEND=docker|hypeman
  • Hypeman: HYPEMAN_BASE_URL + HYPEMAN_API_KEY (or KI_E2E_HYPEMAN_BASE_URL / HYPEMAN_AUTH_TOKEN); optional KI_E2E_HYPEMAN_INGRESS_DOMAIN, KI_E2E_HYPEMAN_INGRESS_TLS (default on), KI_E2E_HYPEMAN_RAW_IP, KI_E2E_HYPEMAN_GPU_PROFILE (vGPU images), KI_E2E_HYPEMAN_SIZE, KI_E2E_HYPEMAN_DISK_IO_BPS (default 62MB/s).
  • Environment is read in exactly one place (hypemanConfigFromEnv): newHypemanBackend(image, cfg) and Start consume an explicit hypemanConfig, so the backend can be constructed programmatically with explicit options and never touches the env. Secrets are referenced by env-var name only, never hardcoded.

Review feedback addressed

  • Dropped Container() testcontainers.Container from the interface + facade — it leaked Docker specifics and was dead (no test used it). The Docker backend keeps *testcontainers.Container internally.
  • De-leaked HostAccess — reframed from "Docker host.docker.internal (Docker backend only)" to a backend-agnostic capability ("reach a service on the test host"); the Docker backend maps host.docker.internal, the hypeman backend rejects it explicitly (no silent no-op) since a remote VM has no host-loopback bridge. Used by the private capmonster / persisted-login tests, which therefore stay on the Docker backend.
  • Start() no longer reads the environment — introduced a hypemanConfig struct holding every option (base URL, token, ingress domain/TLS, raw-IP, size, disk-IO, GPU devices/profile). Env parsing collapses to a single hypemanConfigFromEnv() called by the e2e factory; the backend and Start consume only the struct.
  • Defaulted DiskIoBps to 62MB/s (KI_E2E_HYPEMAN_DISK_IO_BPS overrides). Ad-hoc Hypeman instances otherwise get ~15 MB/s, which starves a playwright-daemon-dependent test's cold first-read (~43 MB of node_modules) past the server's 5 s daemon-start budget; at 62 MB/s the daemon starts in time. (Validated cross-repo in the private-fork mirror kernel/kernel-images-private#226, where the cookie-persistence e2e — which exercises ExecutePlaywrightCode — now passes on Hypeman; it previously failed on playwright daemon failed to start within 5s.)

Verification (both backends exercised end-to-end)

Build/vet/unit: go build ./..., go vet ./e2e/ clean; table tests cover backend selection, raw-IP vs ingress vs TLS endpoint derivation, the per-role ingress params, domain derivation, the HostAccess rejection, and hypemanConfigFromEnv mapping (SDK-native fallbacks, TLS default, comma-split GPU devices). The live Hypeman smoke ran via the new hypemanConfigFromEnv → newHypemanBackend(image, cfg) construction path.

Docker backend — PASS (onkernel/chromium-headful-private + chromium-headless-private):

--- PASS: TestScreenshotHeadless (41.78s)
--- PASS: TestDisplayResolutionChange (46.45s)

Hypeman backend — PASS against the live staging dev server (https://hypeman.dev-yul-hypeman-1.kernel.sh):

KI_E2E_BACKEND=hypeman \
HYPEMAN_BASE_URL=… HYPEMAN_API_KEY=… \
E2E_CHROMIUM_HEADLESS_IMAGE=onkernel/chromium-headless-private:be2ae22 \
go test -run TestDisplayResolutionChange ./e2e/
--- PASS: TestDisplayResolutionChange (30.13s)

This created a real instance, reused the :444→10001 ingress + created ki-e2e-cdp/ki-e2e-cd once, then drove PATCH /display (1024→1920×1080→1280×720) and verified Xvfb resolution via the API server + Exec, all over the TLS ingress. Instance + behavior confirmed; created ingresses persist for reuse, instances are cleaned up on Stop.

GPU (vGPU image): KI_E2E_HYPEMAN_GPU_PROFILE lets the backend boot chromium-headful-vgpu; the GPU-specific tests live in the private fork. They currently boot the vGPU instance to Running but its in-guest API needs the production GPU/Neko/NVIDIA-licensing env to become ready — tracked there.

Unblocks running the public e2e suite against the GPU image from kernel-images-private via the hypeman backend.


CI: running e2e against the Hypeman backend

Added a test-hypeman job to server-test.yaml that runs the same suite with KI_E2E_BACKEND=hypeman. It reuses the public onkernel/chromium-{headful,headless}:<sha> images that build-headful/build-headless push to Docker Hub — Hypeman pulls them itself on instance-create (any registry works via the host's docker creds; validated: the e2e suite already runs on Hypeman against a private onkernel/chromium-headless-private tag). Uses org variable HYPEMAN_BASE_URL + secret HYPEMAN_API_KEY. The runner needs no docker login.

This is the first full-suite run on the Hypeman backend; individual tests may still need backend-specific fixes, so it's reasonable to keep this check non-required in branch protection until it's consistently green.

Conceded: building images inside Hypeman (local-dev iteration) is blocked

The original goal also included a "build a local Dockerfile in Hypeman" path for local dev (edit Dockerfile → build in Hypeman → run e2e against it, without pushing to a registry). This is currently blocked and intentionally not implemented, because:

  • Hypeman can build from a local Dockerfile (POST /builds, async, usable as InstanceNewParams.Image — verified end-to-end with a trivial image), but
  • the builder VM's writable layer is RAM-backed and hard-capped at memory_mb=16384 (the API rejects more with memory_mb exceeds maximum of 16384 MB), and 16 GB is not enough for the chromium image build. Measured scaling on the headless Dockerfile: 2 GB → apt fails at ~18 s, 8 GB → ~28 s, 16 GB → apt passes but the concurrent Go-module/node stages then fail with no space left on device.

So building chromium in Hypeman needs a server-side change (a builder disk-size param decoupled from memory, a higher cap, or a pre-populated global_cache_key for the heavy base layers — the param exists with "ubuntu"/"browser" as documented example keys, which looks like the intended scaling path). Until then:

  • CI uses the registry-pull approach above (unblocked — CI already builds + pushes the images).
  • Local dev for the Hypeman backend should either use the Docker backend (devs already have Docker) or push their build to a registry Hypeman can pull. The local Dockerfile → Hypeman build path is deferred to a follow-up pending the builder-capacity fix.

Note

Medium Risk
Large new e2e infrastructure path (remote VM lifecycle, ingress mutation, secrets via env) with moderate blast radius if misconfigured in CI, though default Docker behavior is preserved.

Overview
Introduces a pluggable Backend for browser e2e instances so the same *TestContainer API can run against Docker (testcontainers, default) or remote Hypeman VMs via KI_E2E_BACKEND. Existing e2e tests stay on NewTestContainer/Start/Stop; Docker logic moves to dockerBackend, and a new hypemanBackend provisions VMs with hypeman-go, ingress or raw-IP routing, image-pull retries, and cleanup on failed bring-up.

Hypeman-specific behavior: env is parsed once into hypemanConfig; HostAccess is rejected (no host loopback bridge); default disk I/O is 62MB/s; wildcard ingress rules are find-or-created per host. TestContainer drops direct testcontainers exposure and adds helpers (ChromeDriverAddr, ChromeDriverWSURL) so BiDi/CDP tests use backend-derived URLs instead of hardcoded ports.

Makefile: test splits into test-unit and test-e2e for CI jobs that only run e2e. Deps: adds github.com/kernel/hypeman-go plus unit tests in backend_test.go for backend selection and Hypeman endpoint/ingress logic.

Reviewed by Cursor Bugbot for commit 810889c. Bugbot is set up for automated code reviews on this repo. Configure here.

Introduce a Backend interface in server/e2e that captures the public surface
the ~24 e2e_*_test.go files consume via *TestContainer (Start/Stop, the
API/CDP/ChromeDriver endpoint accessors, API clients, Wait* helpers, Exec,
ExitCh, Container). TestContainer is now a thin facade that delegates to a
Backend selected at construction time.

Two backends are provided:

- dockerBackend: the historical testcontainers-go logic, moved verbatim behind
  the interface. Default, so existing CI is unchanged.
- hypemanBackend: starts the image as a remote VM on a running Hypeman dev
  server via the github.com/kernel/hypeman-go client. Endpoints target the
  instance's network IP on the fixed guest ports (10001/9222/9224); Exec runs
  against the instance API server's /process/exec endpoint to preserve the
  (exitCode, combinedOutput, error) contract.

Backend selection is via the KI_E2E_BACKEND env var (docker|hypeman, default
docker). Hypeman connection details are read from env only and never hardcoded:
KI_E2E_HYPEMAN_BASE_URL (or HYPEMAN_BASE_URL) and HYPEMAN_AUTH_TOKEN (or the
SDK-native HYPEMAN_API_KEY). Optional GPU passthrough via
KI_E2E_HYPEMAN_GPU_DEVICES and VM sizing via KI_E2E_HYPEMAN_SIZE.

Test changes are minimal: six direct port-field accesses in two test files now
use backend-agnostic accessors (CDPAddr, ChromeDriverURL, plus new
ChromeDriverAddr/ChromeDriverWSURL helpers) instead of hardcoding
127.0.0.1:<port>, which only ever worked for the Docker backend.

Added infra-free unit tests for backend selection and hypeman config
validation. This unblocks running the e2e suite against the GPU image
(chromium-headful-vgpu) from kernel-images-private via the hypeman backend.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented Jun 4, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatedgolang/​github.com/​docker/​docker@​v28.5.1+incompatible ⏵ v28.5.2+incompatible7270100100100
Addedgolang/​github.com/​kernel/​hypeman-go@​v0.20.071100100100100
Updatedgolang/​golang.org/​x/​sync@​v0.17.0 ⏵ v0.18.099100100100100

View full report

@socket-security
Copy link
Copy Markdown

socket-security Bot commented Jun 4, 2026

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn Critical
Critical CVE: gRPC-Go has an authorization bypass via missing leading slash in :path in golang google.golang.org/grpc

CVE: GHSA-p77j-4mvh-x3m3 gRPC-Go has an authorization bypass via missing leading slash in :path (CRITICAL)

Affected versions: < 1.79.3

Patched version: 1.79.3

From: ?golang/google.golang.org/grpc@v1.75.1

ℹ Read more on: This package | This alert | What is a critical CVE?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Remove or replace dependencies that include known critical CVEs. Consumers can use dependency overrides or npm audit fix --force to remove vulnerable dependencies.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore golang/google.golang.org/grpc@v1.75.1. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

rgarcia and others added 5 commits June 4, 2026 11:07
Addresses review feedback on the backend interface:

- Remove Container() testcontainers.Container from the Backend interface (and
  the TestContainer facade). It leaked Docker-specifics into the otherwise
  backend-agnostic surface and was dead: no e2e test consumed it. The Docker
  backend keeps its *testcontainers.Container internally for Start/Exec.

- Hypeman backend: reach instances via a single host-level wildcard ingress
  (find-or-create, keyed by tag managed-by=ki-e2e) instead of the instance's
  private network IP. Set KI_E2E_HYPEMAN_INGRESS_DOMAIN to route
  "<instance>-<role>.<domain>" through the host's reverse proxy to guest ports
  10001/9222/9224; ingress is created at most once per host and never per
  instance. Unset = previous raw-IP behavior (needs L3 reachability to the
  instance subnet). KI_E2E_HYPEMAN_INGRESS_TLS toggles https/wss on :443.

Verification: go build ./... and go vet ./e2e/ pass; new table tests cover
raw-IP, ingress, and TLS endpoint derivation plus the shared-ingress params.
Docker-backend e2e (TestDisplayResolutionChange + TestScreenshotHeadless)
passes against onkernel/chromium-headful-private + chromium-headless-private.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Start)

Per review: Start() reading env vars is surprising and couples the backend to the
process environment. Introduce hypemanConfig holding every option (BaseURL, Token,
IngressDomain, IngressTLS, RawIP, Size, DiskIOBps, GPUDevices, GPUProfile).
newHypemanBackend(image, cfg) and Start now consume only the struct — env parsing
collapses to a single hypemanConfigFromEnv() called by the e2e factory, so other
callers can populate options explicitly and never touch the environment.

Also defaults DiskIOBps to 62MB/s (KI_E2E_HYPEMAN_DISK_IO_BPS overrides): ad-hoc
hypeman instances otherwise get ~15MB/s, which starves the in-guest playwright
daemon's cold first-read (~43MB of node_modules) past its 5s start budget. With
62MB/s the daemon starts in time — validated: persist_login TestCookiePersistence
Headless now PASSES on hypeman (was failing on "playwright daemon failed to start
within 5s").

go build/vet/unit pass (incl. new TestHypemanConfigFromEnv); live hypeman
TestDisplayResolutionChange passes via the new construction path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ckend

Mirrors the `test` job but with KI_E2E_BACKEND=hypeman, pointing
E2E_CHROMIUM_*_IMAGE at the public onkernel/chromium-{headful,headless}:<sha>
tags that build-headful/build-headless just pushed. Hypeman pulls those images
itself on instance create, so the runner needs no docker login. Uses org
var/secret HYPEMAN_API_URL / HYPEMAN_API_KEY.

Note: we deliberately do NOT build the images inside Hypeman — its builder VM's
writable layer is RAM-backed and hard-capped at memory_mb=16384, which is too
small for the chromium image build (fails with "no space left on device"). The
registry-pull approach sidesteps that entirely. See PR description.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rgarcia rgarcia marked this pull request as ready for review June 5, 2026 17:13
Comment thread server/e2e/container.go
Comment thread server/e2e/backend_hypeman.go
@firetiger-agent
Copy link
Copy Markdown

Created a monitoring plan for this PR.

What this PR does: Adds a pluggable backend interface to the e2e test harness so browser instances can run on Hypeman remote VMs instead of (or alongside) local Docker containers, and wires up a new optional test-hypeman CI job to exercise it.

Intended effect:

  • test (Docker) CI job: baseline PASS (~11 min); confirmed if it stays green — the Docker backend is the moved-unchanged testcontainers-go path.
  • test-hypeman CI job: baseline FAIL on first run (expected per PR description; optional, non-required in branch protection); confirmed when it reaches a consistent green state over subsequent PRs.
  • Unit tests (backend selection, ingress routing, HostAccess rejection): covered by the existing test job; baseline PASS.

Risks:

  • Docker regressiontest CI job, alert if it fails after merge; would indicate the TestContainerBackend facade refactor broke existing Docker e2e.
  • go.mod driftgo build ./... in CI, alert if build fails; new github.com/kernel/hypeman-go and transitive deps could conflict with downstream PRs.
  • Hypeman ingress mutationtest-hypeman CI logs, alert if ingress creation repeatedly errors; wildcard rules are created once on the shared staging Hypeman host and could collide if the host already has conflicting rules.
  • Stale instances on job cancellation — no automated signal; manual check if ki-e2e-* instances accumulate on the Hypeman staging host after force-cancelled CI runs.

Status updates will be posted automatically on this PR as monitoring progresses.

View monitor

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: CDP version uses plain HTTP
    • fetchBrowserWebSocketURL now derives /json/version from CDPURL and maps ws/wss to http/https so TLS ingress backends use HTTPS correctly.

Create PR

Or push these changes by commenting:

@cursor push 42b24aa0dc
Preview (42b24aa0dc)
diff --git a/server/e2e/e2e_cdp_reconnect_test.go b/server/e2e/e2e_cdp_reconnect_test.go
--- a/server/e2e/e2e_cdp_reconnect_test.go
+++ b/server/e2e/e2e_cdp_reconnect_test.go
@@ -456,7 +456,20 @@
 }
 
 func fetchBrowserWebSocketURL(ctx context.Context, c *TestContainer) (string, error) {
-	versionURL := fmt.Sprintf("http://%s/json/version", c.CDPAddr())
+	versionEndpoint, err := url.Parse(c.CDPURL())
+	if err != nil {
+		return "", err
+	}
+	switch versionEndpoint.Scheme {
+	case "ws":
+		versionEndpoint.Scheme = "http"
+	case "wss":
+		versionEndpoint.Scheme = "https"
+	}
+	versionEndpoint.Path = "/json/version"
+	versionEndpoint.RawQuery = ""
+	versionEndpoint.Fragment = ""
+	versionURL := versionEndpoint.String()
 	req, err := http.NewRequestWithContext(ctx, http.MethodGet, versionURL, nil)
 	if err != nil {
 		return "", err

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 32c997e. Configure here.

Comment thread server/e2e/e2e_cdp_reconnect_test.go
@cursor
Copy link
Copy Markdown

cursor Bot commented Jun 5, 2026

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: TLS ChromeDriver URL helpers broken
    • ChromeDriver helper URLs now derive host and ws/wss scheme from parsed ChromeDriver/CDP URLs so TLS ingress endpoints are formed correctly.
  • ✅ Fixed: Hypeman Start leaks remote instances
    • Start now performs best-effort instance deletion rollback when post-create startup steps fail, preventing leaked Hypeman instances on failed starts.

Create PR

Or push these changes by commenting:

@cursor push 78ab86654d
Preview (78ab86654d)
diff --git a/server/e2e/backend_hypeman.go b/server/e2e/backend_hypeman.go
--- a/server/e2e/backend_hypeman.go
+++ b/server/e2e/backend_hypeman.go
@@ -257,22 +257,36 @@
 	}
 	c.instanceID = inst.ID
 
+	cleanupOnError := func(startErr error) error {
+		cleanupCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+		defer cancel()
+		if err := c.client.Instances.Delete(cleanupCtx, c.instanceID); err != nil {
+			return fmt.Errorf("%w (cleanup failed for instance %s: %v)", startErr, c.instanceID, err)
+		}
+		c.instanceID = ""
+		c.ip = ""
+		return startErr
+	}
+
 	// Wait for the guest program to start. The SDK caps the server-side wait at
 	// a few minutes; loop until our context deadline if needed.
 	if err := c.waitForRunning(ctx); err != nil {
-		return err
+		return cleanupOnError(err)
 	}
 
 	if c.useIngress {
 		// Ensure the wildcard ingress rules exist; endpoints derive from the
 		// instance name + domain, so no instance IP is needed.
-		return c.ensureIngress(ctx)
+		if err := c.ensureIngress(ctx); err != nil {
+			return cleanupOnError(err)
+		}
+		return nil
 	}
 
 	// Raw-IP fallback: reach the instance directly on its private network IP.
 	ip, err := c.resolveIP(ctx)
 	if err != nil {
-		return err
+		return cleanupOnError(err)
 	}
 	c.ip = ip
 	return nil

diff --git a/server/e2e/container.go b/server/e2e/container.go
--- a/server/e2e/container.go
+++ b/server/e2e/container.go
@@ -2,6 +2,7 @@
 
 import (
 	"context"
+	"net/url"
 	"strings"
 	"testing"
 
@@ -68,12 +69,30 @@
 // derived from ChromeDriverURL (without scheme). Useful for substring assertions
 // on proxy-rewritten URLs.
 func (c *TestContainer) ChromeDriverAddr() string {
-	return strings.TrimPrefix(c.backend.ChromeDriverURL(), "http://")
+	u, err := url.Parse(c.backend.ChromeDriverURL())
+	if err == nil && u.Host != "" {
+		return u.Host
+	}
+
+	addr := strings.TrimPrefix(c.backend.ChromeDriverURL(), "http://")
+	return strings.TrimPrefix(addr, "https://")
 }
 
 // ChromeDriverWSURL returns the WebSocket URL (ws://host:port/path) for the
 // instance's ChromeDriver proxy. path should include a leading slash.
 func (c *TestContainer) ChromeDriverWSURL(path string) string {
+	u, err := url.Parse(c.backend.ChromeDriverURL())
+	if err == nil && u.Host != "" {
+		if u.Scheme == "https" {
+			u.Scheme = "wss"
+		} else {
+			u.Scheme = "ws"
+		}
+		u.Path = path
+		u.RawQuery = ""
+		u.Fragment = ""
+		return u.String()
+	}
 	return "ws://" + c.ChromeDriverAddr() + path
 }
 

diff --git a/server/e2e/e2e_cdp_reconnect_test.go b/server/e2e/e2e_cdp_reconnect_test.go
--- a/server/e2e/e2e_cdp_reconnect_test.go
+++ b/server/e2e/e2e_cdp_reconnect_test.go
@@ -456,12 +456,24 @@
 }
 
 func fetchBrowserWebSocketURL(ctx context.Context, c *TestContainer) (string, error) {
-	versionURL := fmt.Sprintf("http://%s/json/version", c.CDPAddr())
-	req, err := http.NewRequestWithContext(ctx, http.MethodGet, versionURL, nil)
+	versionURL, err := url.Parse(c.CDPURL())
 	if err != nil {
 		return "", err
 	}
+	if versionURL.Scheme == "wss" {
+		versionURL.Scheme = "https"
+	} else {
+		versionURL.Scheme = "http"
+	}
+	versionURL.Path = "/json/version"
+	versionURL.RawQuery = ""
+	versionURL.Fragment = ""
 
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, versionURL.String(), nil)
+	if err != nil {
+		return "", err
+	}
+
 	resp, err := http.DefaultClient.Do(req)
 	if err != nil {
 		return "", err

You can send follow-ups to the cloud agent here.

…rgets

The test-hypeman job ran `make test`, which runs unit tests first — a flaky
chromium-dependent unit test (lib/devtoolsproxy, unrelated to the backend)
failed and blocked the e2e suite from running at all. Split `test` into
`test-unit` + `test-e2e` and point the hypeman job at `test-e2e` so it exercises
only the e2e suite on the Hypeman backend (unit tests already run in the `test`
job). The var/secret fix is confirmed working — the prior config error is gone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@sjmiller609 sjmiller609 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Comment thread server/e2e/backend_hypeman.go
rgarcia and others added 4 commits June 5, 2026 13:31
Bugbot (correctly): after Instances.New, a failure in waitForRunning/
ensureIngress/resolveIP returned from Start without deleting the instance, and
tests only register Stop after a successful Start — so failed runs leaked a
remote VM. Start now tears the instance down (fresh ctx, so a cancelled/expired
parent ctx still deletes) if bring-up fails.

Defense in depth for the cases Start can't cover (panic/timeout/crashed runner
after a successful Start): tag instances managed-by=ki-e2e on create, and add a
nightly workflow (hypeman-reap-e2e.yml) that deletes "ki-e2e-" instances older
than 3h (> the 2h e2e timeout, so it can't touch an in-progress run). One reaper
covers instances from both this repo and the private fork (shared dev server).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t_ready)

A freshly-pushed image tag isn't on the hypeman host yet on first use; the create
call returns a retryable 400 image_not_ready while the pull runs in the
background. Poll Instances.New until the pull completes or ctx is done, instead
of failing the first test that uses a new tag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rivate)

The hypeman e2e backend lives upstream here, but actually *running* it against
the staging hypeman server is moving to kernel-images-private on a Tailscale-
joined runner: CDP/ChromeDriver are being made tailnet-only, kernel-images is
public (its CI logs would leak live instance CDP URLs), and self-hosted/tailnet
runners shouldn't be exposed to a public repo. The public CI keeps the docker-
backend e2e only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rgarcia rgarcia merged commit d725de1 into main Jun 5, 2026
4 checks passed
@rgarcia rgarcia deleted the e2e-backend-interface branch June 5, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants