Skip to content

T5-P4-A6-WP1 Enforce the contract — capability-gate context exhaustion detection#4099

Open
Trecek wants to merge 3 commits into
developfrom
t5-p4-a6-wp1-enforce-the-contract-once-p2-a1-wp3-lands-its-c/4023
Open

T5-P4-A6-WP1 Enforce the contract — capability-gate context exhaustion detection#4099
Trecek wants to merge 3 commits into
developfrom
t5-p4-a6-wp1-enforce-the-contract-once-p2-a1-wp3-lands-its-c/4023

Conversation

@Trecek

@Trecek Trecek commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add a capabilities parameter to classify_infra_exit so context-exhaustion classification is gated by supports_context_exhaustion_detection. Remove the stale _FORWARD_DECLARED exemption and add branch-level coverage for both the True and False paths. The prerequisite work (P2-A1-WP3) did not land a capability-gated branch, so this plan includes adding the gate itself.

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260612-175100-661664/.autoskillit/temp/make-plan/t5_p4_a6_wp1_enforce_capability_gate_plan_2026-06-12_175700.md

🤖 Generated with Claude Code via AutoSkillit

Closes #4023

Token Usage Summary

Step Model count uncached output cache_read peak_ctx turns cache_write time
plan* opus[1m] 1 61 15.7k 1.8M 98.9k 47 99.5k 10m 11s
verify* sonnet 1 3.3k 10.1k 483.1k 62.9k 30 42.0k 7m 11s
implement* MiniMax-M3 1 2.6M 10.4k 0 0 90 0 5m 35s
merge_gate_fix* sonnet 1 142 15.7k 1.0M 77.5k 47 56.6k 7m 56s
audit_impl* sonnet 1 44 10.3k 173.9k 49.4k 14 39.2k 6m 17s
prepare_pr* MiniMax-M3 1 188.1k 1.7k 0 0 15 0 36s
compose_pr* MiniMax-M3 1 211.8k 1.1k 0 0 13 0 38s
Total 3.0M 65.0k 3.5M 98.9k 237.3k 38m 26s

* Step used a non-Anthropic provider; caching behavior may differ.

Token Efficiency

Step LoC Changed cache_read/LoC cache_write/LoC output/LoC
plan 0
verify 0
implement 232 0.0 0.0 44.7
merge_gate_fix 50 20223.3 1131.4 314.9
audit_impl 0
prepare_pr 0
compose_pr 0
Total 282 12375.0 841.4 230.7

Model Usage Breakdown

Model steps uncached output cache_read cache_write time
opus[1m] 1 61 15.7k 1.8M 99.5k 10m 11s
sonnet 3 3.5k 36.2k 1.7M 137.8k 21m 25s
MiniMax-M3 3 3.0M 13.1k 0 0 6m 49s

Trecek and others added 3 commits June 12, 2026 18:12
…t_exhaustion_detection capability

classify_infra_exit now requires a BackendCapabilities argument and wraps
the two CONTEXT_EXHAUSTED checks (session._is_context_exhausted() and
_CODEX_CONTEXT_EXHAUSTION_PATTERN search) in
capabilities.supports_context_exhaustion_detection. Backends that do not
implement context-exhaustion detection fall through to downstream
classification.

The production call site in _headless_result._build_skill_result passes
backend.capabilities (already in scope).

Removes supports_context_exhaustion_detection from _FORWARD_DECLARED in
test_capability_consumption.py — the field now has a production consumer
in _exit_classification.py, so the exemption is no longer needed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…y-gate coverage

Updates 37 callers of classify_infra_exit across three test files to pass
capabilities=CLAUDE_CODE_CAPABILITIES (preserves existing behavior since
the Claude Code capabilities have supports_context_exhaustion_detection=True):
- test_exit_classification.py: 27 calls
- test_adapt_agent_result.py: 2 calls
- test_api_error_signal_invariants.py: 8 calls

Adds TestContextExhaustionCapabilityGate with two tests:
- test_capability_false_suppresses_context_exhausted (False branch coverage)
- test_capability_true_allows_context_exhausted (True branch coverage)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ill_result against xdist timing races

The test used CHANNEL_B_NO_STDOUT_SCRIPT (2.0s gap) and lacked
natural_exit_grace_seconds, making it susceptible to a Phase 1 detection
race on WSL2 under 4-worker xdist load: if Phase 1 asyncio scheduling
is delayed >2.0s, Phase 2 initializes scan_pos after the marker is
already in the file and Channel B never fires, producing TIMED_OUT
instead of COMPLETED.

Fixes:
- Inline a 3.0s-gap script (matching the WSL2-jitter margin documented
  in CHANNEL_B_THEN_A_CONFIRM_SCRIPT) so Phase 1 always discovers the
  file before the marker arrives
- Add natural_exit_grace_seconds=0.1 to avoid asyncio-waitpid thread
  contention (script hangs with time.sleep(3600))
- Increase timeout to 300 and _phase1_timeout to 400 (consistent with
  proven-stable drain tests in TestChannelBDrainWait)
- Increase pytest.mark.timeout to 360 (matches peer drain tests)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant