Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion docs/features/telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ MCPProxy collects anonymous usage statistics to help improve the product. This p

## What is collected

MCPProxy sends a **daily heartbeat** containing only aggregate, non-identifying information. The current schema is **version 3** (`schema_version: 3` in the JSON payload); the schema is forward-compatible so older consumers simply ignore fields they don't recognize.
MCPProxy sends a **daily heartbeat** containing only aggregate, non-identifying information. The current schema is **version 5** (`schema_version: 5` in the JSON payload); the schema is forward-compatible so older consumers simply ignore fields they don't recognize.

| Field | Example | Purpose |
|-------|---------|---------|
Expand All @@ -22,9 +22,15 @@ MCPProxy sends a **daily heartbeat** containing only aggregate, non-identifying
| `feature_flags.docker_available` | `true` | Fraction of installs with a reachable Docker daemon (schema v3) |
| `server_protocol_counts` | `{"stdio":3,"http":2,"sse":0,"streamable_http":1,"auto":0}` | Ratio of remote-HTTP vs local-stdio upstreams (schema v3) |
| `server_docker_isolated_count` | `2` | How many configured servers the runtime actually wraps in Docker isolation (schema v3) |
| `feature_flags.docker_isolation_enabled` | `true` | Whether global Docker isolation is turned on (schema v5). Lets us tell "isolation on, 0 matching servers" apart from "isolation off" |
| `feature_flags.docker_cli_source` | `bundled` | How the `docker` CLI was located — fixed enum `path` / `bundled` / `login_shell` / `absent` (schema v5). The direct signal for "Docker installed but not on the spawn PATH" (issue #696). **Never** the path string itself |

The `server_protocol_counts` map uses a **fixed enum of keys** (`stdio`, `http`, `sse`, `streamable_http`, `auto`) — server names and URLs are never included. Unknown or misconfigured protocol values are bucketed into `auto`.

The `docker_cli_source` field is likewise a **fixed enum** (`path`, `bundled`, `login_shell`, `absent`); the resolved path is never transmitted.

Docker isolation failures surface in `error_code_counts_24h` via three stable diagnostic codes (schema v5): `MCPX_DOCKER_CLI_NOT_FOUND` (isolation requested but the `docker` binary is unresolved — issue #696), `MCPX_DOCKER_EXEC_NOT_FOUND` (the image lacks the interpreter the server needs, e.g. `uvx` missing in `python:3.11`), and `MCPX_DOCKER_OCI_RUNTIME` (OCI runtime / architecture-mismatch failures).

## What is NOT collected

The following is **never** collected:
Expand Down
86 changes: 86 additions & 0 deletions internal/diagnostics/classifier.go
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,24 @@ func classifyDocker(err error, _ ClassifierHints) Code {
strings.Contains(msg, "docker") && strings.Contains(msg, "image") && strings.Contains(msg, "pull") && strings.Contains(msg, "fail"),
strings.Contains(msg, "manifest unknown"):
return DockerImagePullFailed
// docker CLI unresolved (#696). These shapes are unambiguous about the
// docker BINARY being missing, so they classify even without the
// DockerIsolated hint (e.g. shellwrap's resolution-failure error).
case strings.Contains(msg, "docker not found in path"),
strings.Contains(msg, "docker not found in login shell"),
strings.Contains(msg, "docker: command not found"),
strings.Contains(msg, "command not found: docker"), // zsh: "zsh:1: command not found: docker"
strings.Contains(msg, "docker: not found"),
strings.Contains(msg, `"docker": executable file not found`):
return DockerCLINotFound
// OCI runtime failures from `docker run`. NOTE: a BARE "exec format error"
// is intentionally NOT matched here — a non-docker, wrong-architecture host
// stdio binary emits the same string and must stay STDIO-classified. The
// docker-isolated path routes bare "exec format error" via the hinted
// classifyDockerIsolatedSpawn; here we require real OCI/runc context.
case strings.Contains(msg, "oci runtime"),
strings.Contains(msg, "runc"):
return DockerOCIRuntime
}
return ""
}
Expand Down Expand Up @@ -155,8 +173,66 @@ func classifyQuarantine(err error, _ ClassifierHints) Code {
return ""
}

// classifyDockerIsolatedSpawn maps a spawn/exec failure on a Docker-isolated
// server to a specific DOCKER code. Returns "" when the error is not a
// recognised docker-isolation failure (caller falls through to generic stdio
// handling).
//
// Case order is load-bearing:
// 1. The docker BINARY itself is missing (#696) — must win even though its
// message also contains "command not found" / "executable file not found".
// 2. The in-container interpreter is missing — real docker output nests this
// inside an "OCI runtime create failed: … exec: \"x\": executable file not
// found" string, so it must be checked BEFORE the generic OCI case below.
// 3. Any other OCI runtime failure (exec format error / runc).
func classifyDockerIsolatedSpawn(err error) Code {
// Host couldn't even start the docker binary (direct exec path).
var execErr *exec.Error
if errors.As(err, &execErr) && errors.Is(execErr.Err, syscall.ENOENT) &&
strings.Contains(strings.ToLower(execErr.Name), "docker") {
return DockerCLINotFound
}

msg := strings.ToLower(err.Error())
switch {
// (1) docker binary unresolved: shellwrap resolution failure, or the shell
// / Go exec layer reporting `docker` itself missing. Cover both shell
// wordings: bash/sh `docker: command not found` AND zsh's reversed
// `zsh:1: command not found: docker` (the common macOS login-shell shape) —
// the latter must beat the generic "command not found" → EXEC case below.
case strings.Contains(msg, `"docker": executable file not found`),
strings.Contains(msg, "docker: command not found"),
strings.Contains(msg, "command not found: docker"),
strings.Contains(msg, "docker: not found"),
strings.Contains(msg, "docker not found in path"),
strings.Contains(msg, "docker not found in login shell"):
return DockerCLINotFound
// (2) in-container interpreter missing (image lacks uvx/node/python/…).
case strings.Contains(msg, "executable file not found"),
strings.Contains(msg, "no such file or directory"),
strings.Contains(msg, "command not found"):
return DockerExecNotFound
// (3) other OCI runtime failures (arch mismatch, runc start failure).
case strings.Contains(msg, "oci runtime"),
strings.Contains(msg, "exec format error"),
strings.Contains(msg, "runc"):
return DockerOCIRuntime
}
return ""
}

// classifyStdio handles os/exec spawn errors and handshake failures.
func classifyStdio(err error, hints ClassifierHints) Code {
// Docker-isolated servers run `docker run …` over the stdio transport, so
// ENOENT-class failures here are docker-specific (#696 CLI missing, or an
// image/interpreter mismatch) rather than a plain host-binary miss. Resolve
// those to DOCKER codes before the generic stdio matching below.
if hints.DockerIsolated {
if c := classifyDockerIsolatedSpawn(err); c != "" {
return c
}
}

var execErr *exec.Error
if errors.As(err, &execErr) {
// exec.Error wraps os.PathError which wraps syscall.Errno; ENOENT/EACCES
Expand All @@ -167,6 +243,9 @@ func classifyStdio(err error, hints ClassifierHints) Code {
if errors.Is(execErr.Err, syscall.EACCES) {
return STDIOSpawnEACCES
}
if errors.Is(execErr.Err, syscall.ENOEXEC) {
return STDIOSpawnExecFormat
}
}

// exec.ExitError — process started but exited non-zero during handshake.
Expand All @@ -193,6 +272,13 @@ func classifyStdio(err error, hints ClassifierHints) Code {
msg := err.Error()
lmsg := strings.ToLower(msg)
switch {
// Wrong-arch / non-executable host binary (ENOEXEC). Guarded against
// docker OCI context ("oci runtime"/"runc") so a real containerized
// exec-format failure still falls through to classifyDocker → OCI; a
// BARE "exec format error" is a host stdio problem, not a Docker one.
case strings.Contains(lmsg, "exec format error") &&
!strings.Contains(lmsg, "oci runtime") && !strings.Contains(lmsg, "runc"):
return STDIOSpawnExecFormat
case strings.Contains(lmsg, "no such file or directory"),
strings.Contains(lmsg, "executable file not found"),
strings.Contains(lmsg, "command not found"):
Expand Down
93 changes: 93 additions & 0 deletions internal/diagnostics/classifier_domains_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,99 @@ func TestClassify_Docker_SnapAppArmor(t *testing.T) {
}
}

// TestClassify_Docker_IsolationSpawn exercises the #696 / image-mismatch
// routing: when the docker-isolation hint is set, ENOENT-class failures on the
// stdio transport must resolve to DOCKER codes rather than a plain
// MCPX_STDIO_SPAWN_ENOENT, so the telemetry dashboard sees the real cause.
func TestClassify_Docker_IsolationSpawn(t *testing.T) {
cases := []struct {
name string
err error
hint ClassifierHints
want Code
}{
{
// #696: docker CLI not on the spawn PATH; the login-shell wrap
// reports `docker: command not found`.
name: "cli_not_found_shell",
err: errors.New("stdio transport (command=\"/bin/zsh\", docker_isolation=true): recent stderr: docker: command not found"),
hint: ClassifierHints{Transport: "stdio", DockerIsolated: true},
want: DockerCLINotFound,
},
{
// #696 via zsh login shell: the common macOS shape is the REVERSED
// wording `zsh:1: command not found: docker` (name after the colon),
// which must still classify as CLI-not-found, not in-container EXEC.
name: "cli_not_found_zsh_reversed",
err: errors.New("stdio transport (docker_isolation=true): recent stderr: zsh:1: command not found: docker"),
hint: ClassifierHints{Transport: "stdio", DockerIsolated: true},
want: DockerCLINotFound,
},
{
// shellwrap resolution failure surfaces this even without the hint.
name: "cli_not_found_resolve",
err: errors.New("docker not found in PATH or well-known locations"),
hint: ClassifierHints{Transport: "stdio", DockerIsolated: true},
want: DockerCLINotFound,
},
{
// In-container interpreter missing (e.g. uvx absent in python:3.11).
name: "exec_not_found",
err: errors.New("docker: Error response from daemon: failed to create task: OCI runtime create failed: runc create failed: exec: \"uvx\": executable file not found in $PATH: unknown"),
hint: ClassifierHints{Transport: "stdio", DockerIsolated: true},
want: DockerExecNotFound,
},
{
// OCI runtime / arch mismatch with no interpreter-missing detail.
name: "oci_runtime",
err: errors.New("docker: Error response from daemon: failed to create shim task: OCI runtime create failed: exec format error: unknown"),
hint: ClassifierHints{Transport: "stdio", DockerIsolated: true},
want: DockerOCIRuntime,
},
{
// Bare "exec format error" WITH the isolation hint → OCI (wrong-arch
// image under docker isolation).
name: "bare_exec_format_isolated",
err: errors.New("stdio transport (docker_isolation=true): recent stderr: exec format error"),
hint: ClassifierHints{Transport: "stdio", DockerIsolated: true},
want: DockerOCIRuntime,
},
{
// Bare "exec format error" WITHOUT the isolation hint must stay
// STDIO-classified (a non-docker wrong-arch host binary), NOT a
// Docker code. Codex round-5 regression.
name: "bare_exec_format_no_hint_stays_stdio",
err: errors.New("failed to spawn stdio server: recent stderr: exec format error"),
hint: ClassifierHints{Transport: "stdio"},
want: STDIOSpawnExecFormat,
},
{
// A real docker OCI error that lacks the hint but carries "oci
// runtime" context still classifies as OCI (not STDIO), via the
// classifyDocker fallback.
name: "oci_context_no_hint",
err: errors.New("oci runtime create failed: exec format error"),
hint: ClassifierHints{Transport: "stdio"},
want: DockerOCIRuntime,
},
{
// Same ENOENT string WITHOUT the isolation hint stays a plain stdio
// spawn failure — no false DOCKER attribution for host stdio servers.
name: "non_containerized_enoent",
err: errors.New("failed to spawn: executable file not found in $PATH"),
hint: ClassifierHints{Transport: "stdio"},
want: STDIOSpawnENOENT,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if got := Classify(tc.err, tc.hint); got != tc.want {
t.Errorf("Classify(%q) = %q, want %q", tc.err, got, tc.want)
}
})
}
}

// --- CONFIG -----------------------------------------------------------------

func TestClassify_Config_ParseError(t *testing.T) {
Expand Down
20 changes: 18 additions & 2 deletions internal/diagnostics/codes.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,13 @@ package diagnostics

// STDIO domain — stdio-transport MCP server failures.
const (
STDIOSpawnENOENT Code = "MCPX_STDIO_SPAWN_ENOENT"
STDIOSpawnEACCES Code = "MCPX_STDIO_SPAWN_EACCES"
STDIOSpawnENOENT Code = "MCPX_STDIO_SPAWN_ENOENT"
STDIOSpawnEACCES Code = "MCPX_STDIO_SPAWN_EACCES"
// STDIOSpawnExecFormat: the stdio binary exists but is the wrong CPU
// architecture / not an executable format (ENOEXEC — "exec format error").
// Distinct from a Docker/OCI exec-format failure, which is
// MCPX_DOCKER_OCI_RUNTIME under the docker-isolation hint.
STDIOSpawnExecFormat Code = "MCPX_STDIO_SPAWN_EXEC_FORMAT"
STDIOExitNonzero Code = "MCPX_STDIO_EXIT_NONZERO"
STDIOExitBeforeInitialize Code = "MCPX_STDIO_EXIT_BEFORE_INITIALIZE"
STDIOHandshakeTimeout Code = "MCPX_STDIO_HANDSHAKE_TIMEOUT"
Expand Down Expand Up @@ -51,6 +56,17 @@ const (
DockerImagePullFailed Code = "MCPX_DOCKER_IMAGE_PULL_FAILED"
DockerNoPermission Code = "MCPX_DOCKER_NO_PERMISSION"
DockerSnapAppArmor Code = "MCPX_DOCKER_SNAP_APPARMOR"
// DockerCLINotFound: isolation was requested but the `docker` binary could
// not be resolved on the spawn PATH (issue #696 — Docker Desktop installed
// without the admin-gated CLI shim, or a LaunchAgent's minimal PATH).
DockerCLINotFound Code = "MCPX_DOCKER_CLI_NOT_FOUND"
// DockerExecNotFound: the container started but its entrypoint interpreter
// is missing from the image (e.g. `uvx` absent in `python:3.11`). Distinct
// from a HOST stdio ENOENT, which is MCPX_STDIO_SPAWN_ENOENT.
DockerExecNotFound Code = "MCPX_DOCKER_EXEC_NOT_FOUND"
// DockerOCIRuntime: the OCI runtime (runc) failed to start the container —
// e.g. an `exec format error` (image/host architecture mismatch).
DockerOCIRuntime Code = "MCPX_DOCKER_OCI_RUNTIME"
)

// CONFIG domain — configuration parsing and validation failures.
Expand Down
38 changes: 38 additions & 0 deletions internal/diagnostics/registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,16 @@ func seedSTDIO() {
},
DocsURL: docsURL(STDIOSpawnEACCES),
})
register(CatalogEntry{
Code: STDIOSpawnExecFormat,
Severity: SeverityError,
UserMessage: "The configured command is the wrong CPU architecture or not an executable (exec format error). Install a build that matches this machine.",
FixSteps: []FixStep{
{Type: FixStepCommand, Label: "Check the binary's architecture", Command: "file <command-path>"},
{Type: FixStepLink, Label: "Install a matching build", URL: docsURL(STDIOSpawnExecFormat)},
},
DocsURL: docsURL(STDIOSpawnExecFormat),
})
register(CatalogEntry{
Code: STDIOExitNonzero,
Severity: SeverityError,
Expand Down Expand Up @@ -292,6 +302,34 @@ func seedDOCKER() {
},
DocsURL: docsURL(DockerSnapAppArmor),
})
register(CatalogEntry{
Code: DockerCLINotFound,
Severity: SeverityError,
UserMessage: "Docker isolation is enabled but the `docker` command could not be found. Install Docker, or add its CLI to your PATH.",
FixSteps: []FixStep{
{Type: FixStepCommand, Label: "Check docker is on PATH", Command: "docker --version"},
{Type: FixStepLink, Label: "Install Docker / enable the CLI", URL: docsURL(DockerCLINotFound)},
},
DocsURL: docsURL(DockerCLINotFound),
})
register(CatalogEntry{
Code: DockerExecNotFound,
Severity: SeverityError,
UserMessage: "The Docker image is missing the interpreter this server needs (e.g. the image has no `uvx`/`node`). Pick an image that includes it.",
FixSteps: []FixStep{
{Type: FixStepLink, Label: "Choosing a Docker isolation image", URL: docsURL(DockerExecNotFound)},
},
DocsURL: docsURL(DockerExecNotFound),
})
register(CatalogEntry{
Code: DockerOCIRuntime,
Severity: SeverityError,
UserMessage: "The Docker container failed to start (OCI runtime error). This is often an image/CPU architecture mismatch.",
FixSteps: []FixStep{
{Type: FixStepLink, Label: "Troubleshooting OCI runtime errors", URL: docsURL(DockerOCIRuntime)},
},
DocsURL: docsURL(DockerOCIRuntime),
})
}

// --- CONFIG --------------------------------------------------------------
Expand Down
6 changes: 6 additions & 0 deletions internal/diagnostics/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,12 @@ type DiagnosticError struct {
type ClassifierHints struct {
Transport string // "stdio", "http", "sse", "docker", etc.
ServerID string
// DockerIsolated is true when the failing server is launched through Docker
// isolation (`docker run …` over the stdio transport). It lets the
// classifier route ENOENT-class spawn failures to DOCKER codes (CLI missing
// per #696, in-container interpreter missing) instead of a generic
// MCPX_STDIO_SPAWN_ENOENT. See classifyDockerIsolatedSpawn.
DockerIsolated bool
}

// FixRequest is the input to a registered fixer.
Expand Down
1 change: 1 addition & 0 deletions internal/httpapi/telemetry_payload_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ func (fakeRuntimeStats) GetRoutingMode() string { return "retrieve_to
func (fakeRuntimeStats) IsQuarantineEnabled() bool { return true }
func (fakeRuntimeStats) IsDockerAvailable() bool { return false }
func (fakeRuntimeStats) GetDockerIsolatedServerCount() int { return 0 }
func (fakeRuntimeStats) GetDockerCLISource() string { return "absent" }

func TestHandleGetTelemetryPayload_OK(t *testing.T) {
logger := zap.NewNop().Sugar()
Expand Down
10 changes: 10 additions & 0 deletions internal/runtime/runtime.go
Original file line number Diff line number Diff line change
Expand Up @@ -2596,6 +2596,16 @@ func (r *Runtime) IsDockerAvailable() bool {
return r.dockerProbeResult
}

// GetDockerCLISource returns the coarse, fixed-enum branch that resolved the
// docker CLI — "path" | "bundled" | "login_shell" | "absent" (implements
// telemetry.RuntimeStats, schema v5 / MCP-2745). This is the direct #696 fleet
// signal (docker installed but not on the spawn PATH). It delegates to
// shellwrap.ResolveDockerSource, which shares the process-wide docker-path
// cache, so this is cheap on the heartbeat path. NEVER returns the path itself.
func (r *Runtime) GetDockerCLISource() string {
return shellwrap.ResolveDockerSource(r.logger)
}

// GetDockerIsolatedServerCount returns how many currently-configured servers
// the runtime actually wraps in a Docker container (implements
// telemetry.RuntimeStats, schema v3).
Expand Down
Loading
Loading