Skip to content

fix: handle large (1M) context windows for Opus 4.x and compaction#3519

Open
akhil29 wants to merge 2 commits into
tailcallhq:mainfrom
akhil29:fix/large-context-window-handling
Open

fix: handle large (1M) context windows for Opus 4.x and compaction#3519
akhil29 wants to merge 2 commits into
tailcallhq:mainfrom
akhil29:fix/large-context-window-handling

Conversation

@akhil29

@akhil29 akhil29 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Fixes #3518

Two related fixes so that large context windows (e.g. Claude Opus 4.x at 1M tokens) are used fully, instead of being treated as 200K or having compaction capped to a small hardcoded threshold.

1. get_context_length() returned 200K for 1M-token Opus models

crates/forge_app/src/dto/anthropic/response.rs matched the prefix claude-opus-4- and returned 200_000, which incorrectly captured claude-opus-4-6/4-7/4-8 — all 1M-token models per crates/forge_repo/src/provider/provider.json (e.g. "id": "claude-opus-4-8", "context_length": 1000000).

Fix: add an explicit 1M branch for the claude-opus-4-6/4-7/4-8 prefixes before the generic claude-opus-4- 200K branch. Older Opus 4.x models (4, 4-1) remain at 200K. Added test_get_context_length_opus_1m_models.

Note: this get_context_length table is the fallback for the native Anthropic /models path; the vertex_ai_anthropic path already reads context_length from provider.json.

2. Default compaction token_threshold (100K) ignored large windows

The effective compaction trigger is min(configured_token_threshold, context_window * token_threshold_percentage) (crates/forge_domain/src/agent.rs). The embedded default token_threshold = 100000 (crates/forge_config/.forge.toml) made the trigger min(100K, 0.7 × 1M) = 100K on a 1M model — only ~10% of the window, firing compaction far too early and leaving ~900K tokens unused.

Fix:

  • Remove the hardcoded token_threshold = 100000 from the embedded default config.
  • In compaction_threshold, when token_threshold is unset, derive it from the model's context window (70%); when it is explicitly set, keep treating it as an absolute cap (min with the window-derived value) for safety headroom.

Resulting behavior:

Model context window configured token_threshold effective threshold
128K (codex-spark) none 89.6K (70%)
200K none 140K (70%)
1M (Opus) none 700K (70%)
1M (Opus) 100K (explicit) 100K (respected as cap)

Updated test_compaction_threshold_uses_context_window_percentage_when_unset and added test_compaction_threshold_large_window_not_capped_to_hardcoded_default. Existing tests that explicitly set token_threshold are unaffected (they exercise the Some cap branch).

Testing

No Rust toolchain was available in my local environment to run cargo test, so I'm relying on CI. The changes are small and the affected unit tests were updated to match the new behavior.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Akhil Appana seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions Bot added os: windows Windows-specific issue or feature. type: fix Iterations on existing features or infrastructure. labels Jun 15, 2026
message_threshold = 200
on_turn_end = false
retention_window = 6
token_threshold = 100000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be kept same as user can configure it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — done. I've restored token_threshold = 100000 in crates/forge_config/.forge.toml so it stays user-configurable.

To still fix the large-window issue I made compact.token_threshold optional in compaction_threshold (crates/forge_domain/src/agent.rs):

  • When set (the shipped default 100K, or any user value) → treated as an absolute cap: min(token_threshold, 70% × context_window), preserving the small-window headroom safety.
  • When unset → derived purely from the context window (70%), so large windows (e.g. 1M Opus) aren't capped to a small hardcoded value.

So the default behavior is unchanged for everyone, the config knob is preserved, and users on large-context models can either raise token_threshold or unset it to get automatic window-based sizing. Force-pushed as e375e34.

Two related fixes so large context windows are used fully instead of
being treated as 200K / capped to a small hardcoded compaction threshold.

1. get_context_length() returned 200K for 1M-token Opus models. The
   generic `claude-opus-4-` prefix branch captured claude-opus-4-6/4-7/4-8,
   which are 1M-token models. Add an explicit 1M branch before it.

2. Make the compaction `token_threshold` optional so it no longer forces a
   small cap on large windows. The configurable default is kept in
   crates/forge_config/.forge.toml (token_threshold = 100000) so users can
   still tune it. When it is set it is treated as an absolute cap (lower of
   it and 70% of the context window) preserving headroom on small windows;
   when it is unset the threshold is derived purely from the context window
   (70%), so large windows (e.g. 1M-token models) are not capped to a small
   hardcoded value.

Fixes tailcallhq#3518
@akhil29 akhil29 force-pushed the fix/large-context-window-handling branch from 9483f26 to e375e34 Compare June 16, 2026 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

os: windows Windows-specific issue or feature. type: fix Iterations on existing features or infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large context windows (1M Opus) mishandled: wrong get_context_length + compaction hard-capped at 100K

3 participants