fix(docker): direct-exec resolved docker binary for isolated spawn (#696 follow-up)#699
Conversation
Deploying mcpproxy-docs with
|
| Latest commit: |
e7ac964
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://2321a134.mcpproxy-docs.pages.dev |
| Branch Preview URL: | https://fix-mcp-2744-docker-direct-e.mcpproxy-docs.pages.dev |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
📦 Build ArtifactsWorkflow Run: View Run Available Artifacts
How to DownloadOption 1: GitHub Web UI (easiest)
Option 2: GitHub CLI gh run download 27673404798 --repo smart-mcp-proxy/mcpproxy-go
|
🤖 Codex review (model gpt-5.5, grounded in
|
The MCP-2744 direct-exec docker-isolation spawn replaced the $SHELL -l -c login-shell wrap, so the child inherits only the secure-env allowlist. That allowlist excluded DOCKER_HOST/DOCKER_CONTEXT/DOCKER_CONFIG/DOCKER_TLS_VERIFY/ DOCKER_CERT_PATH, so for Colima/rootless/remote daemons the docker binary resolved but 'docker run' could not reach the daemon (silent regression). Add DockerDaemonEnvVars() to the secure-env allowlist (DefaultEnvConfig) and thread them through for command/stdio servers even when an operator supplies a custom Environment allowlist. These are connection pointers, not secrets. Related #699 Related #696
Addressed Codex P2 — Docker daemon env preserved on direct-exec path (b54d0b0)Good catch — confirmed and fixed. Root cause: the direct-exec spawn ( Fix:
Tests (TDD, all offline):
Verified: |
🤖 Codex re-review (round 2) — still REQUEST_CHANGES (1×P2)The env-var preservation fix doesn't actually cover the documented Colima/rootless case. [P2] Allowlist only preserves inherited vars — capture from the login shell —
|
…-exec spawn Codex round-2 (P2) on PR #699: the round-1 fix only added DOCKER_HOST/ DOCKER_CONTEXT/… to the secure-env allowlist, which preserves vars already in os.Environ(). When mcpproxy is launched from the macOS tray / LaunchAgent those vars exist ONLY in the user's login shell rc files, so the allowlist was a no-op and the direct-exec docker spawn still hit the default socket (Colima/rootless/remote daemons unreachable). Fix: capture the docker daemon-access vars FROM the login shell, mirroring the existing LoginShellPATH machinery. - internal/shellwrap/shellwrap.go — new LoginShellEnvVars(logger, keys): one `<shell> -l -c` invocation prints the requested vars bracketed by markers and separated by 0x1F (banner-safe), parsed and cached per-key process-wide. Returns only non-empty values; no-op on Windows / on capture failure. - internal/secureenv/manager.go — BuildSecureEnvironment, gated on EnhancePath (the upstream-spawn signal), injects any docker daemon var still missing from the env via the login shell. Vars already present (allowlist + os.Environ) are left untouched, so it only forks when something is actually missing, and the underlying capture is cached. Overridable loginShellEnvFn for tests. The round-1 allowlist append (client.go) is kept: it still covers vars that ARE in os.Environ() under a custom operator allowlist. The two paths are complementary. TDD: TestLoginShellEnvVars_CapturesFromShellOnly (var exported only inside the fake login shell, absent from os.Environ) + TestBuildSecureEnvironment_Injects DockerVarFromLoginShell (restrictive allowlist proves the login-shell path, not the allowlist, supplied DOCKER_HOST) + EnhancePath-gating guard. All offline. Related #696
Round-2 (P2) addressed — capture DOCKER_* from the login shell (
|
🤖 Codex re-review (round 3) — still REQUEST_CHANGES (2×P2)The login-shell DOCKER_* capture works, but it introduced two new issues — one robustness, one isolation/security. [P2] Verify the resolved path before direct-exec —
|
…rPath (#696) Minimal #696 fix per board decision (MCP-2744): drop the direct-exec spawn migration and all DOCKER_* env experiments; keep only the negative-cache fix. Root cause of 'status shows docker_path but spawn used bare docker': under a LaunchAgent/sandbox the first resolution can fail because only the restricted login-shell leg failed, caching a negative for dockerPathNegativeTTL=60s. Early upstream spawns hit that window and fall back to bare 'docker'. The well-known -path probe is a pure os.Stat (never sandbox- or login-shell-restricted), so a live cached negative must not shadow a docker binary present at a well-known path right now. ResolveDockerPath now re-runs the cheap os.Stat probe before honoring a still-live negative and, on success, upgrades the cache to a permanent success. With the cache fixed, the existing post-#697 shell wrap runs the resolved ABSOLUTE docker path as the command token, so login-shell PATH is irrelevant and #696 is fixed with no Colima regression and no env handling change. The direct-exec migration + login-shell env hydration are deferred to a separate systematic task. TDD: TestResolveDockerPath_StatProbeOverridesLiveNegative asserts a successful os.Stat well-known probe wins over a still-live cached negative (no TTL expiry). Related #696 Co-Authored-By: Paperclip <noreply@paperclip.ing>
c9eac4f to
e7ac964
Compare
Revised to the minimal #696 fix per board decision (
|
🤖 Codex review — ACCEPT ✅Re-reviewed after re-scoping to the minimal #696 fix (negative-cache stat-probe override; direct-exec + DOCKER_* experiments reverted).
No P0–P2 findings. Fixes the "status shows Reviewed via |
…aunchd) — MCP-2751 (#701) * feat(shellwrap): hydrate login-shell env once at startup (macOS GUI/launchd) When mcpproxy is launched from a macOS GUI/launchd context (Launchpad, the SMAppService login item, or the tray spawning the core), the Go core inherits a launchd-minimal environment: it never sources the user's login shell, so it lacks Homebrew/Docker PATH entries and exported vars like DOCKER_HOST. This is the root-cause class behind the recent docker/PATH point-fixes (#697, #699). Add a one-time, allow-listed login-shell environment hydration at startup so every spawn path (docker lifecycle, stdio servers, uvx/npx, ResolveDockerPath, secureenv.BuildSecureEnvironment) inherits a correct environment with no call-site changes: - internal/shellwrap/hydrate.go: captureLoginShellEnv() sources `$SHELL -l -c 'env -0'` once (NUL-delimited, marker-bracketed, 5s timeout, sync.Once) and caches the parsed env. LoginShellPATH now reads PATH from this shared capture — one shell fork, not two. - HydrateFromLoginShell() gates on macOS + launchd-minimal PATH (no-op on terminal launches and non-macOS), merges PATH login-first, and applies a curated allow-list (DOCKER_*, HTTP(S)/NO_PROXY, NVM_DIR/ASDF_DIR/PYENV_ROOT/ VOLTA_HOME/HOMEBREW_PREFIX/COLIMA_HOME) set-if-empty via os.Setenv. Secrets are never imported; HOME/USER/SHELL are never touched; values are never logged (key names + lengths only). - secureenv: extend the default allow-list with the curated DOCKER_*/proxy/ tool-home keys so the hydrated vars survive the filter into upstream spawns. Scanner MinimalEnv stays narrow (credential-free). - cmd/mcpproxy: call HydrateFromLoginShell after logger setup, before any manager reads os.Environ(). Supersedes the per-spawn DOCKER_* env capture; the absolute-path docker probe (#697) and negative-cache fix (#699) remain as complementary fallbacks. Related #699 * fix(shellwrap): preserve intentionally-empty operator vars during hydration Codex review (PR #701, P2): the curated-key set-if-empty guard used `os.Getenv(key) != ""`, which treats an explicitly set-empty value the same as unset. An operator who sets `HTTPS_PROXY=` or `DOCKER_HOST=` to DISABLE an inherited value would have it overwritten from the login shell — violating the never-clobber-operator-values contract. Use os.LookupEnv to distinguish unset from intentionally-empty: any present key (even empty) is operator intent and is preserved. launchd never sets these to empty, so present-but-empty is always deliberate. Adds TestHydrate_PreservesIntentionallyEmptyVar. Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(shellwrap): alias-aware set-if-unset for proxy var spellings Codex round-2 (PR #701, P2): proxy vars (HTTP_PROXY/HTTPS_PROXY/NO_PROXY) are case-insensitive to HTTP clients, so the per-key LookupEnv guard could still clobber operator intent — e.g. operator sets `https_proxy=` (lowercase, to disable) but the login shell exports `HTTPS_PROXY=...`; the exact-key check on the uppercase spelling missed the lowercase presence and imported it anyway. Split the curated allow-list into single-spelling keys (DOCKER_*, tool-home) and an alias-aware proxy trio: for each logical proxy var, if EITHER spelling is already present in the process env (including intentionally empty), skip hydrating BOTH spellings. Adds TestHydrate_ProxyCaseAliasingPreservesOperatorIntent (daemon https_proxy='' + login HTTPS_PROXY=http://x → neither imported). Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(shellwrap): drop proxy vars, fix launch gate for pre-seeded PATH Codex round-3 scope reduction: - Remove HTTP_PROXY/HTTPS_PROXY/NO_PROXY from hydration and from secureenv DefaultEnvConfig allow-list. Proxy forwarding to every stdio upstream is out of scope; filed as a separate opt-in follow-up. - Fix the launch gate: hydration now triggers when PATH looks launchd- minimal OR any DOCKER_* curated var is absent. A GUI launcher can pre-seed PATH via /etc/paths yet still not export DOCKER_HOST from rc files, so PATH-comprehensiveness alone is insufficient. - PATH-merge stays gated on pathIsMinimal only — a comprehensive PATH is not widened even when the DOCKER_* gate fires. Tests: - Rename GateNoOpOnComprehensivePath → GateNoOpOnComprehensivePathAndAllDockerPresent (now requires both conditions, so all 5 DOCKER_* vars are pre-set) - Add ComprehensivePathStillHydratesDockerIfMissing — verifies the new secondary gate: comprehensive PATH, missing DOCKER_HOST → hydrates curated vars but leaves PATH unmodified - Drop PreservesIntentionallyEmptyVar and ProxyCaseAliasingPreservesOperatorIntent (proxy vars are no longer hydrated) - Update MinimalPathHydratesCuratedSet to assert proxy vars are NOT imported Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
…otless regression) Codex round-4 (PR #703 REQUEST_CHANGES, P2): the direct-exec branch regressed non-Darwin rootless/remote Docker. Dropping the `$SHELL -l` wrap also drops rc-file env inheritance, and HydrateFromLoginShell (MCP-2751) is Darwin-only while BuildSecureEnvironment only forwards DOCKER_* already in os.Environ(). So on Linux a daemon whose DOCKER_HOST/DOCKER_CONTEXT lives only in .profile would be lost by direct-exec — the same DOCKER_* loss that made #699 keep the shell wrap. Add a second precondition (dockerDaemonEnvGuaranteed) alongside the verified- absolute-executable check: direct-exec only when the daemon env is guaranteed without the login shell — macOS (startup hydration captured DOCKER_* into os.Environ()) OR DOCKER_HOST/DOCKER_CONTEXT already present in the process env on any platform. Otherwise keep the `$SHELL -l` wrap so `docker run` inherits the rc-file daemon config; the wrap now prefers the resolved absolute path (still sidestepping the login shell's PATH re-derivation) and drops to bare "docker" only when resolution failed. dockerDaemonEnvGOOS is a package var so the Darwin branch is testable on a non-Darwin CI host. New tests: non-Darwin + DOCKER_HOST-only-in-rc stays shell-wrapped with the absolute path; non-Darwin + DOCKER_HOST in env direct- execs; and a table test for the gate. Related #703 Related #699 Co-Authored-By: Paperclip <noreply@paperclip.ing>
…CP-2753) (#703) * fix(docker): direct-exec resolved docker binary for isolated spawn (MCP-2753) PR #697 resolved the docker CLI to an absolute path but still routed the isolated spawn through `$SHELL -l -c "<docker> run …"`, where the login shell re-derives $PATH from rc files and can drop the Docker Desktop bundle dir. The absolute path was only a token in the -c string, so docker-isolated servers still failed with `command not found: docker` when the CLI was not on the LaunchAgent/tray spawn PATH — even though `upstream_servers list` reported a good docker_path. On successful resolution to a VERIFIED absolute executable, setupDockerIsolation now returns the resolved binary as the exec command with raw docker argv (no shell wrap), mirroring the management path's newDockerCmd. It returns a shellWrapped flag so callers pick the matching cidfile helper: the new args-based insertCidfileIntoDockerArgs inserts `--cidfile` after the `run` token (platform-agnostic, sidestepping the -c vs /c quirk), while the login-shell fallback keeps the string-based insertCidfileIntoShellDockerCommand. The direct-exec branch is gated by isDirectExecutable (filepath.IsAbs + os.Stat + exec-bit). ResolveDockerPath's last resort runs `command -v docker` in the login shell, which can emit a shell function/alias/bare name; such a value is rejected and shell-wrapped instead of failing a direct exec. Login-shell wrap of bare `docker` is also kept on the resolution-failure fallback. The DOCKER_* daemon-env half of MCP-2753 is already satisfied by the startup login-shell hydration shipped in MCP-2751 (#701): DOCKER_* are hydrated into os.Environ() and carried through BuildSecureEnvironment's default allow-list, so the direct-exec'd docker process still reaches Colima/rootless/remote daemons without a per-spawn shell capture. Related #699 Related #696 * fix(docker): gate direct-exec on guaranteed daemon env (non-Darwin rootless regression) Codex round-4 (PR #703 REQUEST_CHANGES, P2): the direct-exec branch regressed non-Darwin rootless/remote Docker. Dropping the `$SHELL -l` wrap also drops rc-file env inheritance, and HydrateFromLoginShell (MCP-2751) is Darwin-only while BuildSecureEnvironment only forwards DOCKER_* already in os.Environ(). So on Linux a daemon whose DOCKER_HOST/DOCKER_CONTEXT lives only in .profile would be lost by direct-exec — the same DOCKER_* loss that made #699 keep the shell wrap. Add a second precondition (dockerDaemonEnvGuaranteed) alongside the verified- absolute-executable check: direct-exec only when the daemon env is guaranteed without the login shell — macOS (startup hydration captured DOCKER_* into os.Environ()) OR DOCKER_HOST/DOCKER_CONTEXT already present in the process env on any platform. Otherwise keep the `$SHELL -l` wrap so `docker run` inherits the rc-file daemon config; the wrap now prefers the resolved absolute path (still sidestepping the login shell's PATH re-derivation) and drops to bare "docker" only when resolution failed. dockerDaemonEnvGOOS is a package var so the Darwin branch is testable on a non-Darwin CI host. New tests: non-Darwin + DOCKER_HOST-only-in-rc stays shell-wrapped with the absolute path; non-Darwin + DOCKER_HOST in env direct- execs; and a table test for the gate. Related #703 Related #699 Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
Summary
Docker-isolated upstream servers still failed to start with
command not found: dockerwhen thedockerCLI is not on the spawn PATH (Docker Desktop installed without the optional CLI-tools step), even after PR #697. #697 resolved the absolute path but still routed the spawn through$SHELL -l -c "<docker> run …", where the login shell re-derives$PATHfrom rc files and can drop the bundle dir. The absolute path was only a token inside the-cstring, so the bug persisted whileupstream_servers listreported a healthydocker_path(same resolver, different time/path → "status available, spawn not found").Related #696. Branched from
origin/main(contains #697 /79849e0c).Changes
A — direct-exec the resolved docker binary (primary). On successful resolution,
setupDockerIsolationnow returns the resolved absolute docker binary as the exec command with the raw docker argv (no login-shell wrap), mirroring the management path'snewDockerCmd(exec.CommandContext(dockerBin, …)). It returns ashellWrappedflag so callers select the matching cidfile helper. Login-shell wrap of baredockeris kept only on the resolution-failure fallback. The spawn env already carries the enhanced PATH (BuildSecureEnvironment,EnhancePath=true), and exec'ing an absolute path needs no PATH at all.insertCidfileIntoDockerArgsinserts--cidfile <file>right after theruntoken in the argv slice — platform-agnostic, sidestepping the-cvs/cshell quirk. Both call sites (connection_stdio.go,connection_launcher.go) branch onshellWrapped.B — negative-cache cannot shadow a live os.Stat probe.
ResolveDockerPathre-runs the cheap, sandbox-immune well-knownos.Statprobe before honoring a still-live cached negative, so a failure cached when only the restricted login-shell leg failed can never permanently shadow a docker binary sitting at a well-known path.Tests (TDD — written first, watched fail, then implemented)
TestSetupDockerIsolationExecsResolvedBinaryDirectly(new, the regression that failed today): assertsfinalCommand == <resolved>,args[0] == "run", no shell-cindirection.TestSetupDockerIsolationUsesResolvedAbsolutePath,…CidfileInsertionWithAbsolutePath,…FallsBackToBareDocker,TestInsertCidfileWindowsCmdFormatfor the args-based cidfile + direct-exec shape; addedTestInsertCidfileIntoDockerArgs.TestResolveDockerPath_StatProbeOverridesLiveNegative(new) — a still-live negative must not shadow a now-present well-known docker.Verification
Docs:
docs/docker-isolation.mdtroubleshooting section updated to describe direct-exec (ENG-9).