Target Workflow: build-test.md
Source report: #2000
Estimated cost per run: N/A (token counts from AWF api-proxy; cost not populated)
Total tokens per run (avg): ~621K (4,344K / 7 data runs)
Cache hit rate: 88.5% — already excellent
LLM turns (avg): 13.3 requests/run (range: 10–18)
Model: claude-sonnet-4.6 via Copilot
Total daily token spend: 4,344K (most expensive workflow today)
Current Configuration
| Setting |
Value |
| Tools loaded |
bash: ["*"] + github: MCP (all tools, ~22) |
| GitHub toolsets |
None specified — loads full default toolset |
| Network groups |
12 groups: defaults, github, node, go, rust, crates.io, java, dotnet, bun.sh, deno.land, jsr.io, dl.deno.land |
| Runtimes |
node 20, go 1.22, rust stable, java 21, dotnet 8.0 |
| Pre-agent steps |
❌ None |
| Prompt size |
8,592 bytes / 203 body lines |
| Avg tokens/request |
~46.7K (large growing context) |
Root cause of high token count: All setup operations (2 runtime installs, 8 repo clones, 1 Maven config) are invoked by the LLM as bash tool calls, adding accumulated context to every subsequent turn. With no pre-agent steps, each of these ~11 deterministic operations consumes LLM context budget unnecessarily.
Recommendations
1. Move Deterministic Setup to steps: (Pre-Agent)
Estimated savings: ~190–250K tokens/run (~30–40% reduction)
The agent currently invokes bash for all of these operations inside the conversation:
- Install Bun via
curl (1+ LLM turns)
- Install Deno via
curl (1+ LLM turns)
gh repo clone for 8 repositories (up to 8 LLM turns)
- Create
~/.m2/settings.xml for Maven (1 LLM turn)
These 11 operations are fully deterministic — they never need LLM reasoning. Moving them to steps: means:
- They execute before the agent starts (no LLM turns consumed)
- Their output does not accumulate in the conversation history
- Subsequent turns have a smaller context, reducing tokens on every later call
Token savings mechanics: At ~47K tokens/request and ~13 turns/run, eliminating 4–6 early turns saves: direct turn savings (~200K) + compounded context reduction (~30–50K) = ~230–250K tokens/run.
Implementation — add a steps: block to the YAML frontmatter:
steps:
- name: Install Bun
run: |
curl -fsSL (bun.sh/redacted) | bash
echo "BUN_INSTALL=$HOME/.bun" >> "$GITHUB_ENV"
echo "$HOME/.bun/bin" >> "$GITHUB_PATH"
- name: Install Deno
run: |
curl -fsSL (deno.land/redacted) | sh
echo "DENO_INSTALL=$HOME/.deno" >> "$GITHUB_ENV"
echo "$HOME/.deno/bin" >> "$GITHUB_PATH"
- name: Clone test repositories
env:
GH_TOKEN: $\{\{ secrets.GH_AW_GITHUB_MCP_SERVER_TOKEN }}
run: |
RESULTS=""
for spec in \
"Mossaka/gh-aw-firewall-test-bun:/tmp/test-bun:Bun" \
"Mossaka/gh-aw-firewall-test-cpp:/tmp/test-cpp:C++" \
"Mossaka/gh-aw-firewall-test-deno:/tmp/test-deno:Deno" \
"Mossaka/gh-aw-firewall-test-dotnet:/tmp/test-dotnet:.NET" \
"Mossaka/gh-aw-firewall-test-go:/tmp/test-go:Go" \
"Mossaka/gh-aw-firewall-test-java:/tmp/test-java:Java" \
"Mossaka/gh-aw-firewall-test-node:/tmp/test-node:Node.js" \
"Mossaka/gh-aw-firewall-test-rust:/tmp/test-rust:Rust"; do
repo=$(echo "$spec" | cut -d: -f1)
path=$(echo "$spec" | cut -d: -f2)
name=$(echo "$spec" | cut -d: -f3)
if gh repo clone "$repo" "$path" 2>/dev/null; then
RESULTS+="$name: cloned ✅\n"
else
RESULTS+="$name: CLONE_FAILED ❌\n"
fi
done
printf "%b" "$RESULTS"
echo "CLONE_RESULTS<<EOF" >> "$GITHUB_ENV"
printf "%b" "$RESULTS" >> "$GITHUB_ENV"
echo "EOF" >> "$GITHUB_ENV"
- name: Configure Maven proxy
run: |
mkdir -p ~/.m2
cat > ~/.m2/settings.xml << 'SETTINGS'
<settings>
<proxies>
<proxy>
<id>awf-http</id><active>true</active><protocol>http</protocol>
<host>squid-proxy</host><port>3128</port>
</proxy>
<proxy>
<id>awf-https</id><active>true</active><protocol>https</protocol>
<host>squid-proxy</host><port>3128</port>
</proxy>
</proxies>
</settings>
SETTINGS
Update the prompt to reference pre-cloned paths and inform the agent of clone outcomes:
Replace the per-task clone instructions with:
**SETUP ALREADY DONE (pre-agent steps):**
- Bun is installed at `$BUN_INSTALL/bin/bun`
- Deno is installed at `$DENO_INSTALL/bin/deno`
- Maven `~/.m2/settings.xml` is configured
- Clone results: $\{\{ env.CLONE_RESULTS }}
Test repos are at `/tmp/test-{bun,cpp,deno,dotnet,go,java,node,rust}`.
For any ecosystem marked CLONE_FAILED, record that status and skip its tests.
This also lets the agent skip CLONE_FAILED ecosystems without trying (a bash turn per failed ecosystem that currently still occurs).
2. Remove GitHub MCP Toolset (or restrict with toolsets:)
Estimated savings: ~15–20K tokens/run (~2–3%)
The github: tool loads the full GitHub MCP server (~22 tools). Looking at what the workflow actually needs:
- Repo cloning →
gh repo clone in bash (no MCP needed)
- Posting PR comment →
safe-outputs: add-comment (no MCP needed)
- Adding label →
safe-outputs: add-labels (no MCP needed)
- Detecting PR vs
workflow_dispatch trigger → available as $\{\{ github.event_name }} in pre-steps
Option A (preferred): Remove github: tools entirely.
# Before:
tools:
bash:
- "*"
github:
github-token: "$\{\{ secrets.GH_AW_GITHUB_MCP_SERVER_TOKEN }}"
# After:
tools:
bash:
- "*"
And pass the trigger context to the agent via a pre-step:
steps:
- name: Set trigger context
run: |
echo "TRIGGER_EVENT=$\{\{ github.event_name }}" >> "$GITHUB_ENV"
echo "PR_NUMBER=$\{\{ github.event.pull_request.number || '' }}" >> "$GITHUB_ENV"
Then update the prompt's conditional label logic to reference $TRIGGER_EVENT and $PR_NUMBER.
Option B (minimal change): Restrict with toolsets:
If GitHub MCP is needed for any PR reads, restrict scope:
tools:
bash:
- "*"
github:
github-token: "$\{\{ secrets.GH_AW_GITHUB_MCP_SERVER_TOKEN }}"
toolsets: [pull_requests]
This reduces from ~22 tools to ~5, saving ~10K tokens/turn in system prompt.
3. Trim Network Groups to Actually-Needed Ones
Estimated savings: ~0 tokens (runtime cost, not token cost) — but reduces attack surface
Currently loads 12 network groups. The builds that run:
- Bun →
bun.sh ✅ needed
- C++ → no package manager network needed (cmake + local sources) — may not need
node group
- Deno →
deno.land, jsr.io, dl.deno.land ✅ needed
- .NET →
dotnet ✅ needed
- Go →
go ✅ needed
- Java (Maven) →
java ✅ needed
- Node.js →
node ✅ needed
- Rust (Cargo) →
rust, crates.io ✅ needed
Network groups don't directly impact token counts but removing unused ones (bun.sh for deno-only tasks, etc.) reduces potential exfiltration surface.
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/run |
~621K |
~380–430K |
−31–39% |
| LLM turns/run |
~13.3 |
~7–9 |
−4–6 turns |
| Effective tokens/run |
~694K |
~430–480K |
−31–38% |
| Daily total (7 runs) |
4,344K |
~2,700–3,000K |
~1,300–1,600K saved/day |
| Input:Output ratio |
99:1 |
~99:1 (unchanged) |
— |
| Cache hit rate |
88.5% |
~88–90% (stable) |
— |
The high cache rate (88.5%) means further cache tuning offers diminishing returns. The primary lever is reducing the number of LLM turns by moving deterministic work out of the agent.
Implementation Checklist
Generated by Daily Copilot Token Optimization Advisor · Source: #2000
Generated by Daily Copilot Token Optimization Advisor · ● 528.6K · ◷
Target Workflow:
build-test.mdSource report: #2000
Estimated cost per run: N/A (token counts from AWF api-proxy; cost not populated)
Total tokens per run (avg): ~621K (4,344K / 7 data runs)
Cache hit rate: 88.5% — already excellent
LLM turns (avg): 13.3 requests/run (range: 10–18)
Model:
claude-sonnet-4.6via CopilotTotal daily token spend: 4,344K (most expensive workflow today)
Current Configuration
bash: ["*"]+github:MCP (all tools, ~22)defaults, github, node, go, rust, crates.io, java, dotnet, bun.sh, deno.land, jsr.io, dl.deno.landRoot cause of high token count: All setup operations (2 runtime installs, 8 repo clones, 1 Maven config) are invoked by the LLM as bash tool calls, adding accumulated context to every subsequent turn. With no pre-agent steps, each of these ~11 deterministic operations consumes LLM context budget unnecessarily.
Recommendations
1. Move Deterministic Setup to
steps:(Pre-Agent)Estimated savings: ~190–250K tokens/run (~30–40% reduction)
The agent currently invokes bash for all of these operations inside the conversation:
curl(1+ LLM turns)curl(1+ LLM turns)gh repo clonefor 8 repositories (up to 8 LLM turns)~/.m2/settings.xmlfor Maven (1 LLM turn)These 11 operations are fully deterministic — they never need LLM reasoning. Moving them to
steps:means:Token savings mechanics: At ~47K tokens/request and ~13 turns/run, eliminating 4–6 early turns saves: direct turn savings (~200K) + compounded context reduction (~30–50K) = ~230–250K tokens/run.
Implementation — add a
steps:block to the YAML frontmatter:Update the prompt to reference pre-cloned paths and inform the agent of clone outcomes:
Replace the per-task clone instructions with:
This also lets the agent skip CLONE_FAILED ecosystems without trying (a bash turn per failed ecosystem that currently still occurs).
2. Remove GitHub MCP Toolset (or restrict with
toolsets:)Estimated savings: ~15–20K tokens/run (~2–3%)
The
github:tool loads the full GitHub MCP server (~22 tools). Looking at what the workflow actually needs:gh repo clonein bash (no MCP needed)safe-outputs: add-comment(no MCP needed)safe-outputs: add-labels(no MCP needed)workflow_dispatchtrigger → available as$\{\{ github.event_name }}in pre-stepsOption A (preferred): Remove
github:tools entirely.And pass the trigger context to the agent via a pre-step:
Then update the prompt's conditional label logic to reference
$TRIGGER_EVENTand$PR_NUMBER.Option B (minimal change): Restrict with
toolsets:If GitHub MCP is needed for any PR reads, restrict scope:
This reduces from ~22 tools to ~5, saving ~10K tokens/turn in system prompt.
3. Trim Network Groups to Actually-Needed Ones
Estimated savings: ~0 tokens (runtime cost, not token cost) — but reduces attack surface
Currently loads 12 network groups. The builds that run:
bun.sh✅ needednodegroupdeno.land,jsr.io,dl.deno.land✅ neededdotnet✅ neededgo✅ neededjava✅ needednode✅ neededrust,crates.io✅ neededNetwork groups don't directly impact token counts but removing unused ones (
bun.shfor deno-only tasks, etc.) reduces potential exfiltration surface.Expected Impact
The high cache rate (88.5%) means further cache tuning offers diminishing returns. The primary lever is reducing the number of LLM turns by moving deterministic work out of the agent.
Implementation Checklist
steps:block with: Install Bun, Install Deno, Clone all 8 repos, Configure Maven$\{\{ env.CLONE_RESULTS }}github:fromtools:(or restrict withtoolsets: [pull_requests])TRIGGER_EVENTandPR_NUMBERenv vars to pre-steps for conditional label logicgh aw compile .github/workflows/build-test.mdnpx tsx scripts/ci/postprocess-smoke-workflows.ts