fix: handle large (1M) context windows for Opus 4.x and compaction#3519
fix: handle large (1M) context windows for Opus 4.x and compaction#3519akhil29 wants to merge 2 commits into
Conversation
|
Akhil Appana seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
| message_threshold = 200 | ||
| on_turn_end = false | ||
| retention_window = 6 | ||
| token_threshold = 100000 |
There was a problem hiding this comment.
This should be kept same as user can configure it.
There was a problem hiding this comment.
Good point — done. I've restored token_threshold = 100000 in crates/forge_config/.forge.toml so it stays user-configurable.
To still fix the large-window issue I made compact.token_threshold optional in compaction_threshold (crates/forge_domain/src/agent.rs):
- When set (the shipped default 100K, or any user value) → treated as an absolute cap:
min(token_threshold, 70% × context_window), preserving the small-window headroom safety. - When unset → derived purely from the context window (70%), so large windows (e.g. 1M Opus) aren't capped to a small hardcoded value.
So the default behavior is unchanged for everyone, the config knob is preserved, and users on large-context models can either raise token_threshold or unset it to get automatic window-based sizing. Force-pushed as e375e34.
Two related fixes so large context windows are used fully instead of being treated as 200K / capped to a small hardcoded compaction threshold. 1. get_context_length() returned 200K for 1M-token Opus models. The generic `claude-opus-4-` prefix branch captured claude-opus-4-6/4-7/4-8, which are 1M-token models. Add an explicit 1M branch before it. 2. Make the compaction `token_threshold` optional so it no longer forces a small cap on large windows. The configurable default is kept in crates/forge_config/.forge.toml (token_threshold = 100000) so users can still tune it. When it is set it is treated as an absolute cap (lower of it and 70% of the context window) preserving headroom on small windows; when it is unset the threshold is derived purely from the context window (70%), so large windows (e.g. 1M-token models) are not capped to a small hardcoded value. Fixes tailcallhq#3518
9483f26 to
e375e34
Compare
Fixes #3518
Two related fixes so that large context windows (e.g. Claude Opus 4.x at 1M tokens) are used fully, instead of being treated as 200K or having compaction capped to a small hardcoded threshold.
1.
get_context_length()returned 200K for 1M-token Opus modelscrates/forge_app/src/dto/anthropic/response.rsmatched the prefixclaude-opus-4-and returned200_000, which incorrectly capturedclaude-opus-4-6/4-7/4-8— all 1M-token models percrates/forge_repo/src/provider/provider.json(e.g."id": "claude-opus-4-8", "context_length": 1000000).Fix: add an explicit 1M branch for the
claude-opus-4-6/4-7/4-8prefixes before the genericclaude-opus-4-200K branch. Older Opus 4.x models (4,4-1) remain at 200K. Addedtest_get_context_length_opus_1m_models.2. Default compaction
token_threshold(100K) ignored large windowsThe effective compaction trigger is
min(configured_token_threshold, context_window * token_threshold_percentage)(crates/forge_domain/src/agent.rs). The embedded defaulttoken_threshold = 100000(crates/forge_config/.forge.toml) made the triggermin(100K, 0.7 × 1M) = 100Kon a 1M model — only ~10% of the window, firing compaction far too early and leaving ~900K tokens unused.Fix:
token_threshold = 100000from the embedded default config.compaction_threshold, whentoken_thresholdis unset, derive it from the model's context window (70%); when it is explicitly set, keep treating it as an absolute cap (minwith the window-derived value) for safety headroom.Resulting behavior:
token_thresholdUpdated
test_compaction_threshold_uses_context_window_percentage_when_unsetand addedtest_compaction_threshold_large_window_not_capped_to_hardcoded_default. Existing tests that explicitly settoken_thresholdare unaffected (they exercise theSomecap branch).Testing
No Rust toolchain was available in my local environment to run
cargo test, so I'm relying on CI. The changes are small and the affected unit tests were updated to match the new behavior.