Skip to content

feat: generate mcp-config.json from MCPG runtime output instead of compile time#252

Open
jamesadevine wants to merge 20 commits intomainfrom
feat/runtime-mcp-config
Open

feat: generate mcp-config.json from MCPG runtime output instead of compile time#252
jamesadevine wants to merge 20 commits intomainfrom
feat/runtime-mcp-config

Conversation

@jamesadevine
Copy link
Copy Markdown
Collaborator

Summary

Replaces compile-time mcp-config.json generation with runtime conversion of MCPG's actual gateway output, matching gh-aw's convert_gateway_config_copilot.cjs pattern.

Problem

The agent inside AWF could not see configured MCP servers. The mcp-config.json (Copilot CLI client config) was generated at compile time and baked into the pipeline YAML via a {{ mcp_client_config }} heredoc. This produced configs that didn't match MCPG's actual runtime state — the agent had no working MCP connections.

Solution

Mirror gh-aw's approach precisely:

  1. Capture MCPG stdoutgateway-output.json (stderr → log file)
  2. Poll after health check until the gateway output contains valid JSON with mcpServers (health check alone doesn't guarantee stdout is flushed)
  3. Transform with jq — rewrite URLs (127.0.0.1host.docker.internal) and set tools: ["*"]
  4. Write result to /tmp/awf-tools/mcp-config.json for the AWF-sandboxed agent

Why the URL rewrite?

MCPG runs with --network host and binds to 127.0.0.1. But the agent runs inside an AWF Docker container on a separate network — 127.0.0.1 there is the container's own loopback. host.docker.internal is the only path back to the host from inside AWF.

Changes

File Change
src/compile/common.rs Removed generate_mcp_client_config() function and 3 associated tests
src/compile/standalone.rs Removed mcp_client_config import, generation, and template replacement
src/compile/onees.rs Same as standalone
src/data/base.yml Removed {{ mcp_client_config }} heredoc; MCPG step now captures stdout, polls for output, transforms with jq
src/data/1es-base.yml Same template changes as base.yml
AGENTS.md Updated {{ mcp_client_config }} docs to reflect the new runtime approach

Net: -89 lines (115 added, 204 removed)

Testing

  • cargo build
  • cargo test — all 66 tests pass ✅
  • Compiled examples/sample-agent.md — output contains no {{ mcp_client_config }}, has jq-based conversion ✅

jamesadevine and others added 3 commits April 17, 2026 23:03
…mpile time

Replaces the compile-time generate_mcp_client_config() approach with runtime
conversion of MCPG's actual gateway output, matching gh-aw's
convert_gateway_config_copilot.cjs pattern.

Changes:
- Remove generate_mcp_client_config() and {{ mcp_client_config }} template marker
- MCPG stdout is now captured to gateway-output.json
- After health check, poll until gateway output contains valid JSON
- Use jq to transform URLs (127.0.0.1 -> host.docker.internal) and add tools: [*]
- Apply same changes to both standalone and 1ES templates
- Update AGENTS.md documentation

This ensures the Copilot CLI config reflects MCPG's actual runtime state rather
than a compile-time prediction, fixing agents not seeing configured MCPs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The stderr redirect to /tmp/gh-aw/mcp-logs/stderr.log failed because the
directory didn't exist yet. Add it to the mkdir -p call alongside the
gateway output directory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use process substitution to tee stderr to both the log file and the
pipeline console, giving real-time visibility into MCPG startup logs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Approach is sound and solves the real problem. One silent-failure bug in the shell script worth fixing before merge.

Findings

🐛 Bugs / Logic Issues

  • src/data/base.yml:304-306 (and 1es-base.yml equivalent) — The jq transform has no error handling. The MCPG step doesn't use set -e, so if jq exits non-zero (e.g., a server entry has url: null or malformed JSON), bash continues, mcp-config.json ends up empty, cp silently propagates that empty file, and the agent then fails with a cryptic error about missing MCP connections. Suggest:

    jq --arg prefix "(redacted) \
      '.mcpServers |= (...)' \
      "$GATEWAY_OUTPUT" > /tmp/awf-tools/mcp-config.json \
      || { echo "##vso[task.complete result=Failed]Failed to transform gateway output to mcp-config.json"; exit 1; }

    The jq sub() function will also runtime-error if any url value is null (which shouldn't happen with well-formed MCPG output, but is worth guarding). An alternative is (.url // "") |= sub(...) to handle null gracefully.

⚠️ Suggestions

  • src/compile/onees.rs:59 — Minor whitespace typo introduced in the diff: // Generate 1ES-specific setup/teardown jobs(no per-job pool is missing a space before (no. Trivial.

✅ What Looks Good

  • exit 1 after ##vso[task.complete result=Failed] — Both the health-check failure path and the new gateway-output-ready failure path now include exit 1. Previously the health check failure set the task result but let the script keep running. This is a genuine correctness fix.
  • Polling loop design — Polling separately for health (/health endpoint, 30s) and then for stdout flush (mcpServers in gateway output, 15s) is the right call. These are genuinely separate readiness signals.
  • jq is not a new dependency — Already used in the threat-analysis step (lines 581–587 of base.yml), so no new agent image requirement.
  • chmod 600 on the generated config files is a good hardening touch — the config contains the bearer token.
  • Clean dead-code removalgenerate_mcp_client_config() and its 3 tests are fully deleted, no orphaned marker handlers left behind. Both base.yml and 1es-base.yml are kept in sync.
  • jq expression correctness — The sub("^(redacted) "\($prefix)/") regex correctly handles both (127.0.0.1/redacted) and paths without trailing components. The map(.value.url |= ... | .value.tools = [...])` chaining is valid jq.

Generated by Rust PR Reviewer for issue #252 · ● 729.7K ·

Copy /tmp/gh-aw/mcp-logs/ into staging/logs/mcpg/ so MCPG gateway logs,
stderr output, and gateway-output.json are available in the published
pipeline artifact for post-run debugging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid approach — the runtime conversion correctly mirrors gh-aw's pattern, but there's one real bug around jq error handling that should be fixed before merge.

Findings

🐛 Bugs / Logic Issues

  • base.yml:295, 1es-base.yml:345 — Silent jq failure creates empty/invalid mcp-config.json

    jq --arg prefix "(redacted) \
      '.mcpServers |= (...)' \
      "$GATEWAY_OUTPUT" > /tmp/awf-tools/mcp-config.json

    If jq exits non-zero (malformed gateway output, unexpected schema, etc.), the shell redirect still creates an empty file and the script continues. The cp and chmod steps then propagate an empty config. The agent will start but silently have no MCP server access — an extremely confusing failure mode with no obvious error in the logs. Since there's no set -e in this step, the exit code is lost.

    Fix: add explicit failure signaling:

    jq --arg prefix "(redacted) \
      '.mcpServers |= (...)' \
      "$GATEWAY_OUTPUT" > /tmp/awf-tools/mcp-config.json \
      || { echo "##vso[task.complete result=Failed]jq transformation of gateway output failed"; exit 1; }
  • base.yml:272-285, 1es-base.yml:320-335 — Gateway output poll doesn't detect MCPG crash

    After health check passes, the 15-second poll loop checks only for valid file output but never tests if MCPG is still alive (kill -0 $MCPG_PID). If MCPG crashes after the health check (e.g., OOM, unhandled config error), the loop will spin the full 15 seconds before failing — wasting wall-clock time and obscuring the real cause.

    Simple fix in the loop body:

    if ! kill -0 $MCPG_PID 2>/dev/null; then
      echo "##vso[task.complete result=Failed]MCPG process died while waiting for gateway output"
      exit 1
    fi

⚠️ Suggestions

  • **base.yml:288, 1es-base.yml:337jq URL rewrite regex only matches `(redacted)

    The sub("^(redacted) ...) pattern silently no-ops for https:// URLs. In practice MCPG outputs (redacted) for local endpoints, so this won't be hit — but the gateway output would still become valid JSON with unrewritten https:// URLs that the AWF-sandboxed agent can't reach. A defensive warning (jqhasdebug()`) or a pre-check assertion would make future surprises easier to diagnose.

✅ What Looks Good

  • The cleanup sequence (rm -f "$GATEWAY_OUTPUT" before starting MCPG) correctly handles restarts after interrupted runs — the old compile-time approach had no equivalent protection.
  • chmod 600 on the generated config is a nice security touch since the file contains the gateway API key.
  • Splitting the MCPG step into health-check → gateway-output poll is the right decomposition; health passing ≠ stdout flushed is a real subtle gotcha that the comment correctly calls out.
  • MCPG stderr being tee'd to both the pipeline console and a log file (then collected in the log-copy step) is excellent observability improvement.
  • Removing the three generate_mcp_client_config tests is correct — they were testing a now-removed code path, not duplicating coverage of something still live.

Generated by Rust PR Reviewer for issue #252 · ● 439.7K ·

MCPG has a 30s startup timeout for stdio MCP containers. The previous
approach used npx -y @azure-devops/mcp which downloads the package at
container launch time, regularly exceeding this timeout.

The AzureDevOpsExtension now emits a prepare_steps() that builds a local
Docker image (ado-mcp-cached:latest) with @azure-devops/mcp pre-installed
via npm install -g. The MCPG config references this cached image, so the
container starts instantly without any network download.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — the runtime approach is clearly the right fix. A few items worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

  • base.yml + 1es-base.yml (MCPG step): jq sub() errors on null URL
    The transform .value.url |= sub("^(redacted) ...) will produce a jq error if any server entry in the gateway output lacks a url field. jq errors cause the step to exit non-zero without writing mcp-config.json, silently breaking MCP connectivity. In current MCPG builds all servers get HTTP URLs, but this is an implementation assumption that could break on MCPG upgrades. Guard it:

    .value.url |= if . then sub("^(redacted) "\($prefix)/") else . end
  • extensions.rs prepare_steps(): intermediate container leaks on failure
    ADO bash: tasks run with set -eo pipefail. If docker start -a "$CONTAINER_ID" fails (e.g., npm install -g error on a transient network issue), the script exits before docker rm "$CONTAINER_ID" runs, leaving an orphaned container. Add a trap:

    trap "docker rm \"$CONTAINER_ID\" 2>/dev/null || true" EXIT

    before docker start.

⚠️ Suggestions

  • base.yml + 1es-base.yml: jq availability
    jq is now required by both the gateway-output polling loop (jq -e '.mcpServers') and the transform step. Neither base template appears to explicitly install jq. If the agent pool image doesn't have it, the GATEWAY_READY poll silently times out after 15s with a confusing "Gateway output file not ready" error. Worth adding a preflight check (command -v jq || apt-get install -y jq) or a comment confirming jq is pre-installed on AZS-1ES-L-MMS-ubuntu-22.04.

  • extensions.rs mcpg_servers(): entrypoint arg with globally-installed package
    After npm install -g @azure-devops/mcp, the MCPG entrypoint is `npx `@azure-devops/mcp. This works if the package registers a bin alias matching its scoped name, but npm install -g registers the binary by whatever name the package declares in its "bin" field — not necessarily @azure-devops/mcp. You may want to confirm the installed binary name (npm ls -g --depth 0 @azure-devops/mcp``) and use it directly (e.g. node /path/to/server.js or the actual binary name) rather than going through `npx`.

  • onees.rs line 59: Minor whitespace regression — setup/teardown jobs(no per-job pool is missing a space before (.

✅ What Looks Good

  • The chmod 600 on both mcp-config.json locations is a good addition — the file contains a bearer token.
  • The two-phase wait (health check → gateway output poll) is the right approach; health-passing doesn't guarantee stdout flush.
  • Test coverage for prepare_steps and cached-image entrypoint args is clean and sufficient.
  • Clean removal of the compile-time generate_mcp_client_config function and its tests — net -89 lines, well-motivated.

Generated by Rust PR Reviewer for issue #252 · ● 797.2K ·

The ADO MCP container runs via npx which needs npm registry access to
resolve @azure-devops/mcp. AWF's host-level iptables rules block outbound
traffic from MCPG-spawned containers unless the domains are allowlisted.

Add 'node' ecosystem identifier to AzureDevOpsExtension::required_hosts()
so npm/node domains are included in AWF's allowed domains list.

Also reverts the cached image approach (pre-install via docker commit)
as allowing network access is the correct fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good — the root cause fix is correct and the runtime conversion approach is sound. Two minor issues worth addressing; one of them is a bug (missing space in a comment isn't a bug, but one item below is a code-level correctness concern).


Findings

⚠️ Suggestions

  • src/data/base.yml:304–306 / 1es-base.yml (same) — The jq transform writes directly to /tmp/awf-tools/mcp-config.json without an error guard. If jq fails (e.g., the gateway output is malformed despite passing .mcpServers check), bash does not have set -e active at that point in the script, so execution continues to cp and cat, silently writing an empty file. Consider either checking exit code explicitly or doing an atomic write:

    jq ... "$GATEWAY_OUTPUT" > /tmp/awf-tools/mcp-config.json.tmp && \
      mv /tmp/awf-tools/mcp-config.json.tmp /tmp/awf-tools/mcp-config.json || \
      { echo "##vso[task.complete result=Failed]jq transform failed"; exit 1; }

    In practice this is a very low-likelihood failure since the output was just validated, but the silent-empty-file failure mode is hard to diagnose.

  • tests/compiler_tests.rs — no test verifying node ecosystem expansion for the ADO extension — The unit test in extensions.rs correctly verifies that required_hosts() returns "node", and the generate_allowed_domains() code at common.rs:1872 correctly expands it via is_ecosystem_identifier. But test_fixture_azure_devops_mcp_compiled_output doesn't assert that npm domains (e.g., registry.npmjs.org) appear in the compiled --allow-domains output. Worth adding a single line:

    assert!(compiled.contains("registry.npmjs.org"), "ADO MCP requires npm registry for npx");
  • src/compile/onees.rs:59 — Minor typo introduced: // Generate 1ES-specific setup/teardown jobs(no per-job pool, uses templateContext). — missing space before (.

✅ What Looks Good

  • Root cause fix is correct. The compile-time config couldn't reflect MCPG's actual runtime state (ports, keys, server list after feature detection). Runtime conversion via jq is the right architecture.
  • Ecosystem identifier for ADO MCP is correct. AzureDevOpsExtension::required_hosts() returning "node" gets properly expanded through generate_allowed_domains() (common.rs:1872) — verified in code. The npx -y @azure-devops/mcp`` invocation in the MCPG container genuinely needs npm registry access.
  • Polling strategy is well-designed. Separating the HTTP health check (MCPG is up) from the stdout poll (config has been flushed to disk) correctly handles the async stdio buffering race.
  • chmod 600 on the MCP config is a good security improvement — the file contains the gateway API key.
  • MCPG stderr surfaced on failure — previously this was invisible; now it appears in the pipeline log when the health check fails.
  • All 66 tests passing with 3 now-obsolete tests cleanly removed confirms no regression.

Generated by Rust PR Reviewer for issue #252 · ● 1.4M ·

Two changes to surface MCPG failures early:

1. Pass DEBUG=* to the MCPG container to enable full debug logging
   on stderr (streamed live via tee). This surfaces backend launch
   details, connection errors, and timeout diagnostics.

2. Add a 'Verify MCP backends' step after MCPG starts that probes
   each configured backend with a tools/list call. This forces
   MCPG's lazy initialization and catches failures (e.g., container
   timeout, network blocked) before the agent runs. Failures are
   surfaced as ADO pipeline warnings.

Relates to #254

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Good approach — runtime generation is clearly correct over compile-time baking. A few concerns worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

  • base.yml/1es-base.ymlDEBUG="*" risks corrupting gateway-output.json
    MCPG is started with -e DEBUG="*" and its stdout is piped directly to gateway-output.json. If any library behind DEBUG="*" emits to stdout rather than stderr (the debug npm package writes to stderr by default, but this isn't guaranteed for all transitive deps), the file will contain non-JSON data and the jq parse will fail. The polling loop would then spin 15 iterations before failing the pipeline. Consider either removing DEBUG="*" entirely, or redirecting stderr explicitly before applying debug and adding a set -o errexit guard around the jq step to get a clear error message on failure.

  • src/compile/onees.rs:59 — Minor typo introduced

    // Generate 1ES-specific setup/teardown jobs(no per-job pool

    Missing space before (. Not functionally harmful but stands out in an otherwise clean codebase.

🔒 Security Concerns

  • src/compile/extensions.rs — "node" ecosystem added to AWF allowlist for the wrong reason
    required_hosts() populates AWF's egress domain allowlist — the network filter that restricts the Copilot agent running inside the AWF Docker container. But the comment says: "npx needs npm registry access to resolve and install the package."
    npx -y @azure-devops/mcp`` is invoked by MCPG, which runs on the host with --network host — fully outside AWF's reach. The AWF allowlist has no effect on what MCPG or its spawned containers can reach. Adding the entire Node ecosystem to AWF's allowlist (`registry.npmjs.org`, `cdn.npmjs.com`, etc.) expands what the agent itself can reach, which is the opposite of intended. If the actual goal is to allow the agent to call `npm`/`node` commands, that's a separate concern tracked by `required_bash_commands()`. This should either be removed, or the justification should be corrected.

⚠️ Suggestions

  • base.yml/1es-base.yml — Probe --max-time 120 is unusually long
    A 2-minute curl timeout per backend means a pipeline with e.g. 3 backends could stall for 6 minutes before this step produces any output. Since the goal is startup readiness verification (MCPG already passed health check), --max-time 30 seems more appropriate. The "lazy launch" concern is valid but most npx installs on a warm cache complete in seconds.

  • base.yml/1es-base.ymlGATEWAY_OUTPUT path inconsistency
    The gateway output file lives at /tmp/gh-aw/mcp-config/gateway-output.json, but logs go to /tmp/gh-aw/mcp-logs/. The path segment mcp-config suggests this is where the derived config lives, but the file is an intermediate artifact. A path like /tmp/gh-aw/mcpg/gateway-output.json would be more self-explanatory alongside the /tmp/awf-tools/mcp-config.json output. Not a bug, just a readability note.

  • base.yml/1es-base.yml##vso[task.complete result=Failed] before exit 1 pattern is now consistently applied
    The original code was missing exit 1 after task.complete for the health check failure path. The new code fixes this in both templates. ✓

✅ What Looks Good

  • The runtime jq transform is clean and correctly handles the 127.0.0.1host.docker.internal rewrite. The regex `^(redacted) safely matches any host:port prefix without needing the port to be known at transform time.
  • Removing the generate_mcp_client_config() Rust function eliminates an entire class of compile-time/runtime drift bugs — this is the right call.
  • The polling approach (health check + JSON validity check as two separate gates) correctly handles the race where health returns 200 before stdout is fully flushed.
  • chmod 600 on the generated config files is good hygiene for files containing the API key.
  • The new "Verify MCP backends" step as a non-fatal warning is appropriate — it surfaces problems without blocking agents that only need a subset of configured backends.

Generated by Rust PR Reviewer for issue #252 · ● 629.1K ·

jamesadevine and others added 4 commits April 18, 2026 00:17
The probe was using curl -sf which swallows error responses (exit 22
on 4xx). Now captures HTTP status code and response body separately
so we can see what MCPG actually returns on failure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Secret variables set via ##vso[task.setvariable;issecret=true] must be
explicitly mapped via env: to be available in bash steps. The probe was
using \ macro syntax in the script body, but bash
interprets \ as command substitution before ADO can expand it.

Pass the key via env: mapping as MCPG_API_KEY and reference it as
\ in the script.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MCPG spec 7.1 requires the API key directly in the Authorization
header, not as a Bearer token. The probe was sending
'Authorization: Bearer <key>' but MCPG expects 'Authorization: <key>'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MCP requires an initialize handshake before any other method. The probe
now sends initialize first, captures the Mcp-Session-Id from the response
header, then sends tools/list with that session ID. Also parses SSE
data lines to extract the tool count.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid approach — the runtime conversion correctly mirrors gh-aw's pattern and fixes a real bug, but there's a silent-failure risk in the jq transform and a debug flag left permanently enabled.

Findings

🐛 Bugs / Logic Issues

  • base.yml / 1es-base.yml — jq failure is silently swallowed
    The jq transform that produces mcp-config.json has no error guard:

    jq --arg prefix "(redacted) \
      '.mcpServers |= ...' \
      "$GATEWAY_OUTPUT" > /tmp/awf-tools/mcp-config.json

    If jq fails (malformed gateway output, missing mcpServers key, etc.) the > redirection truncates the output file to zero bytes while the step exits 0. The agent then starts with an empty mcp-config.json and all MCP tool calls fail with confusing "server not found" errors rather than a clear pipeline failure.
    Fix: write to a temp file and only move on success:

    jq ... "$GATEWAY_OUTPUT" > /tmp/mcp-config.tmp \
      && mv /tmp/mcp-config.tmp /tmp/awf-tools/mcp-config.json \
      || { echo "##vso[task.complete result=Failed]jq failed to transform gateway output"; exit 1; }

    Same applies to the 1es-base.yml copy.

  • -e DEBUG="*" is permanent (base.yml L278, 1es-base.yml L282)
    This was presumably added to diagnose the new stdout-capture approach, but it's unconditionally baked into every generated pipeline. MCPG debug output at this level is very verbose and will make production pipeline logs difficult to read. Should be removed or gated behind an ADO variable (e.g., ${{ if eq(variables['System.Debug'], 'true') }}).

⚠️ Suggestions

  • Probe curl timeout is 120s per backend
    In the "Verify MCP backends" step, each curl call has --max-time 120. With multiple MCP servers configured, a hung backend could stall the probe step for several minutes before surfacing the failure. A shorter timeout (e.g., 30s) would fail faster while still allowing for slow npx installs. The 120s value seems more appropriate for the agent's actual tool calls than a startup health probe.

  • Minor: missing space in comment (src/compile/onees.rs, line ~60)
    // Generate 1ES-specific setup/teardown jobs(no per-job pool → missing space before (no.

✅ What Looks Good

  • The overall approach is correct — runtime conversion from MCPG's actual stdout is fundamentally more reliable than compile-time prediction.
  • The two-phase poll (health check → gateway JSON ready) is robust and correctly handles the MCPG stdout-flush race condition.
  • Adding "node" to AzureDevOpsExtension::required_hosts() is a legitimate fix — without npm registry access, npx -y @azure-devops/mcp`` would fail in the network-isolated sandbox.
  • chmod 600 on the generated config files is a good security touch.
  • MCPG stderr log collection and the eager backend probe step are both valuable operational improvements.
  • The jq expression correctly preserves headers, type, and other fields while only rewriting url and setting tools.

Generated by Rust PR Reviewer for issue #252 · ● 816K ·

jamesadevine and others added 3 commits April 18, 2026 08:21
AWF's DOCKER-USER iptables rules block outbound traffic from containers
on Docker's default bridge network. The ADO MCP container (spawned by
MCPG as a sibling container) was unable to reach dev.azure.com, causing
MCP error -32001 (request timed out) on every tool call.

Adding --network host via the args field bypasses the FORWARD chain
rules, matching gh-aw's approach for its built-in agentic-workflows
MCP server (see mcp_config_builtin.go).

SafeOutputs was unaffected because it runs as an HTTP backend on
localhost — no FORWARD chain involved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move debug/diagnostic pipeline additions behind a compile-time flag
(--debug-pipeline, debug builds only) matching the --skip-integrity
pattern:

- MCPG DEBUG=* env var and stderr tee → {{ mcpg_debug_flags }} marker
- MCP backend probe step → {{ verify_mcp_backends }} marker

When --debug-pipeline is not passed (the default), both markers resolve
to empty strings and the generated pipeline contains no debug
diagnostics. MCPG log artifact collection is kept always-on as it is
low-cost and useful in production.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The ADO MCP container still times out on API calls despite --network host.
Adding DEBUG=* enables verbose logging from the @azure-devops/mcp npm
package to surface the root cause (auth issues, DNS, proxy, etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: The runtime MCP config approach is a solid improvement, but there's a bash syntax bug in the debug-pipeline flag expansion that would break MCPG startup when --debug-pipeline is used.


Findings

🐛 Bugs / Logic Issues

  • src/data/base.yml + src/data/1es-base.yml{{ mcpg_debug_flags }} placement

    The debug flags replacement includes both an env var flag AND a stderr redirect:

    let mcpg_debug_flags = r##"-e DEBUG="*" \
            2> >(tee /tmp/gh-aw/mcp-logs/stderr.log >&2)"##

    But the 2> >(...) redirect is placed in the middle of the docker run multi-line command:

              -e MCP_GATEWAY_API_KEY="$(MCP_GATEWAY_API_KEY)" \
              {{ mcpg_debug_flags }}
              {{ mcpg_docker_env }}
              {{ mcpg_image }}:v{{ mcpg_version }} \
              --routed ... \
              > "$GATEWAY_OUTPUT" &

    In bash, a shell redirect (2>) terminates argument parsing for the current command — it is not a docker flag. Once 2> >(tee ...) is emitted, everything after it ({{ mcpg_docker_env }}, the image name, --routed, and > "$GATEWAY_OUTPUT") is no longer part of docker run. The expanded command would be:

    docker run ... -e MCP_GATEWAY_API_KEY="..." -e DEBUG="*" 2> >(tee /tmp/gh-aw/mcp-logs/stderr.log >&2)
    # ↑ docker run ends here — NO image name, no --routed args
    -e AZURE_DEVOPS_EXT_PAT="$(SC_READ_TOKEN)"  # ← separate (invalid) command
    ghcr.io/github/gh-aw-mcpg:v... \            # ← separate (invalid) command
    --routed ... > "$GATEWAY_OUTPUT" &           # ← never runs as part of docker

    Docker would fail with "docker: Error response from daemon: ... invalid reference format" or similar.

    Fix: Split the replacement — put only -e DEBUG="*" \ in {{ mcpg_debug_flags }}, and introduce a separate {{ mcpg_debug_stderr }} marker placed after > "$GATEWAY_OUTPUT" but before &:

              {{ mcpg_image }}:v{{ mcpg_version }} \
              --routed ... \
              > "$GATEWAY_OUTPUT" {{ mcpg_debug_stderr }} &

    Where {{ mcpg_debug_stderr }} expands to 2> >(tee /tmp/gh-aw/mcp-logs/stderr.log >&2) when debug is on, and empty otherwise.

    Same issue exists in 1es-base.yml.

⚠️ Suggestions

  • src/compile/extensions.rs:459 — ecosystem identifier in required_hosts

    Pushing "node" into required_hosts() is technically correct (the domain expansion logic handles ecosystem identifiers), but mixing raw hosts and ecosystem identifiers in the same return type is undocumented and easy to break. A comment stating this is an ecosystem identifier (not a raw domain) would help future maintainers:

    // "node" is an ecosystem identifier — expands to npm/registry.npmjs.org domains
    hosts.push("node".to_string());
  • src/data/base.yml — gateway output poll timeout

    The outer health-check loop retries 30× at 1s each (30s). The inner gateway output poll retries only 15× at 1s (15s). If a slow container startup stalls MCPG's stdout flush past 15s, the step fails even though MCPG is healthy. Consider increasing to 30 iterations to match the health-check budget, or at least document why 15s is sufficient.

✅ What Looks Good

  • Runtime jq transform is the right design: Eliminating compile-time mcp-config.json generation removes a whole class of "config doesn't match runtime state" bugs. The polling approach (health check → stdout flush validation) correctly handles the race between MCPG readiness and stdout availability.
  • Marker ordering in compile_shared: Processing {{ mcpg_debug_flags }}/{{ verify_mcp_backends }} before shared replacements (step 13 vs 14) is correctly sequenced so {{ mcpg_port }} inside the probe step YAML gets substituted properly.
  • --network host for ADO MCP: The reasoning is solid (AWF DOCKER-USER iptables rules block bridge forwarding), well-commented, and matches gh-aw's approach. The compile-time node ecosystem host addition ensures npm/npx can resolve @azure-devops/mcp.
  • Tests for generate_debug_pipeline_replacements: The two new unit tests cover both enabled and disabled code paths with meaningful assertions.
  • MCPG entrypoint workaround comment: The inline comment explaining the validate_port_mapping() bug and the upstream issue reference is exactly the right level of documentation for a workaround.

Note

🔒 Integrity filter blocked 15 items

The following items were blocked because they don't meet the GitHub integrity level.

  • dd43404 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 75a87e9 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • da394de list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 0bc1c79 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 81542f7 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 8c7ee23 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • e240953 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • f1b51f2 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 589660b list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • afdee6b list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 9dc4557 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • b4ec6b7 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 0d09110 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 2150cdb list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 9b0568e list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Rust PR Reviewer for issue #252 · ● 3.2M ·

jamesadevine and others added 6 commits April 18, 2026 08:50
Add 12 new compiler integration tests covering:

- --skip-integrity: omits integrity step, produces valid YAML (both targets)
- --debug-pipeline: includes DEBUG env, stderr tee, probe step, produces
  valid YAML (both targets)
- Combined flags work together
- Default (no flags) excludes debug content and includes integrity step
- Probe step indentation is correct for both standalone (8sp) and 1ES (18sp)

Also fix replacement content indentation: replace_with_indent adds the
marker's indent to continuation lines, so replacement content must have
zero base indent.

Also refactor compile_fixture into compile_fixture_with_flags with
atomic counter for unique temp dirs to prevent parallel test collisions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Debug replacements must run BEFORE extra_replacements so that
{{ mcpg_port }} inside the probe step content gets resolved by the
subsequent extra_replacements pass. Previously, debug replacements
ran after extra_replacements, leaving {{ mcpg_port }} unresolved.

Add test_debug_pipeline_no_unresolved_markers to catch this class of
ordering bugs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix bash line continuation in the MCPG docker run command:
- mcpg_debug_flags emits backslash when disabled (maintains line
  continuation) and -e DEBUG env flag when enabled
- Stderr tee is always-on (baked into template), not debug-gated
- Fix replacement ordering: debug before extra_replacements so
  mcpg_port in probe step content gets resolved
- Drop mcpg_stderr_redirect marker (back to 2 markers total)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the ADO-specific special case in generate_mcpg_docker_env with
a generic mechanism driven by the CompilerExtension trait.

New trait method: required_pipeline_vars() -> Vec<PipelineEnvMapping>
  - Extensions declare which container env vars map to pipeline variables
  - AzureDevOpsExtension maps AZURE_DEVOPS_EXT_PAT -> SC_READ_TOKEN

The compiler generates the full chain generically:
  1. env: block on MCPG step (ADO secret -> bash var)
  2. -e flag on MCPG docker run (bash var -> MCPG process env)
  3. MCPG config keeps empty string (MCPG passthrough to child)

ADO secrets require env: mapping because Azure DevOps does not expand
secret variables via macro syntax in script bodies.

Also updates azure-devops-mcp-agent fixture to use first-class
tools.azure-devops instead of manual mcp-servers configuration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…KEN)

The @azure-devops/mcp npm package does NOT use AZURE_DEVOPS_EXT_PAT
(that's for the az devops CLI extension). It accepts auth type via
CLI arg (-a) and reads the token from a specific env var per type.

For pipeline use with bearer tokens from az account get-access-token:
- Add '-a envvar' to entrypoint args (selects envvar auth mode)
- Map ADO_MCP_AUTH_TOKEN (not AZURE_DEVOPS_EXT_PAT) to SC_READ_TOKEN
- The ADO MCP reads the bearer token from ADO_MCP_AUTH_TOKEN

Auth types in @azure-devops/mcp (src/auth.ts):
- 'pat': reads PERSONAL_ACCESS_TOKEN (base64 PAT)
- 'envvar': reads ADO_MCP_AUTH_TOKEN (bearer token) ← we use this
- 'azcli'/'env': uses DefaultAzureCredential
- 'interactive': OAuth browser flow (default, breaks stdio)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid architecture with two real bugs and one regression worth fixing before merge.


Findings

🐛 Bugs / Logic Issues

  • tests/compiler_tests.rs ~line 3114test_debug_pipeline_probe_step_indentation_standalone is missing #[test]. The 1ES counterpart immediately below it has the attribute, but this function does not. Rust will compile it as dead code (and warn) but it will never run as a test. The indentation check it contains won't catch regressions.

    // missing #[test] here ↓
    fn test_debug_pipeline_probe_step_indentation_standalone() {
  • src/compile/extensions.rs ~line 548DEBUG: "*" is hardcoded in the ADO MCP container env unconditionally — not gated on --debug-pipeline. Every production run using tools: azure-devops: will have @azure-devops/mcp emitting full debug logs to stdout/stderr. This looks like debug scaffolding left in accidentally. The --debug-pipeline flag controls DEBUG="*" on the MCPG container itself; this separately controls the child ADO MCP container and bypasses that gate.

    let env = Some(HashMap::from([
        ("ADO_MCP_AUTH_TOKEN".to_string(), String::new()),
        ("DEBUG".to_string(), "*".to_string()),  // ← always set, even in production
    ]));

⚠️ Suggestions

  • src/compile/extensions.rs / src/compile/common.rs — The old generate_mcpg_docker_env emitted a compile-time eprintln! warning when tools: azure-devops: was enabled but permissions.read was absent ("token will be empty at runtime"). The new required_pipeline_vars() returns the ADO_MCP_AUTH_TOKEN → SC_READ_TOKEN mapping unconditionally without that guard. If a user configures the ADO tool without permissions.read, $(SC_READ_TOKEN) expands to empty at runtime — silent auth failure. Consider adding a check in AzureDevOpsExtension::validate() to restore the warning.

  • src/data/base.yml / src/data/1es-base.yml — The env: block for the MCPG step is always emitted (it's in the template), even when generate_mcpg_step_env returns an empty string. The result is env:\n which is valid YAML (env: null) and ADO handles it fine, but it would be cleaner to omit the block when empty — or handle this the same way as other nullable blocks in the templates.

  • src/compile/common.rs step 12 comment — The comment says {{ mcpg_port }} is "resolved by extra_replacements" but {{ mcpg_port }} is actually resolved in step 14 (shared replacements). The ordering still works correctly so this is comment-only, but it may confuse future contributors.

✅ What Looks Good

  • The CompilerExtension::required_pipeline_vars() abstraction is a clean extension point — much better than the ad-hoc AZURE_DEVOPS_EXT_PAT special-casing it replaces.
  • The two-phase MCPG readiness poll (health check → JSON validity) is the right fix for the race between health endpoint and stdout flush.
  • The jq URL rewrite is correct and mirrors gh-aw's convert_gateway_config_copilot.cjs faithfully.
  • The compile_fixture_with_flags refactor with an atomic counter for unique temp dirs is a nice improvement that prevents collisions in parallel test runs.
  • --network host on the ADO MCP container is well-documented inline with a reference to the upstream MCPG issue.

Generated by Rust PR Reviewer for issue #252 · ● 1.2M ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant