Skip to content

feat: enable Anthropic prompt caching for system prompts#15

Open
samarthpatel24 wants to merge 1 commit into
microsoft:mainfrom
samarthpatel24:feat/anthropic-prompt-caching
Open

feat: enable Anthropic prompt caching for system prompts#15
samarthpatel24 wants to merge 1 commit into
microsoft:mainfrom
samarthpatel24:feat/anthropic-prompt-caching

Conversation

@samarthpatel24
Copy link
Copy Markdown
Contributor

@samarthpatel24 samarthpatel24 commented May 27, 2026

Problem:

The Anthropic backend re-sends the full system prompt (~5K+ tokens) on everysingle agent step at full price. On a 100-step Claude Opus run, that's ~$7.50 burned just on the same static system prompt repeating 100 times. The irony — the codebase already tracks cached_input_tokens, but it's always 0 because caching was never wired up.

Fix:

Flips on Anthropic prompt caching by sending the system prompt as a content block with cache_control ephemeral. Step 1 writes to cache (1.25x); steps 2–100 read from it (0.1x). That's roughly ~90% savings on system prompt input cost per run.

Changes:

  • anthropic_model.py — System prompt sent as content block array with cache_control
  • anthropic_model.py — Metrics helper now handles both string and list-of-blocks system format
  • anthropic_model.py — Usage parser now captures cache_creation_input_tokens
  • base.py — Added cache_creation_input_tokens to usage metric keys

Not affected:

  • OpenAI / OpenRouter — untouched, just report 0 for the new metric key
  • API version — 2023-06-01 already supports caching, no bump needed
  • Short prompts — if system prompt is under 4096 tokens, API silently skips caching (no error)

Test plan:

  • All 7 existing unit tests pass
  • Live run: confirm cache_read_input_tokens > 0 in usage metrics after step 1

@samarthpatel24
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant