Compaction keeps the context window under budget by summarizing old conversation history into a compact text block. When the context exceeds the reserve threshold, a separate LLM call summarizes everything before a cut point, and a CompactionEntry is appended to the session. On the next turn, build_session_context() presents the summary plus the retained tail instead of the full history.
Compaction is checked at the end of every successful invoke() call in Agent:
if self._compact_requested or self._compaction.should_compact(
self._context_tokens, self._context_window
):
await self._run_compaction(...)should_compact() triggers when:
context_tokens > context_window - reserve_tokensThe default reserve_tokens gives the next turn room to work without immediately hitting the limit again.
Extensions may call ctx.compact() mid-turn to set _compact_requested = True. The actual compaction always waits until after the turn completes.
run_compaction() is also a public API for manual compaction. It requires phase == "idle" and fails immediately if called while a turn is running.
Compaction.prepare(path_entries) analyzes the session branch and returns a CompactionPreparation — everything needed for the LLM summarization call. Returns None if compaction is not possible (e.g., the last entry is already a compaction, or no valid cut point exists).
Steps:
- Find the previous compaction entry index (
find_prev_compaction_index). - Determine the boundary start — where the region to be summarized begins.
- Estimate total context tokens (
estimate_context_tokens). - Find the cut point (
find_cut_point) — the entry at whichretain_recent_tokenskicks in. Everything before the cut is summarized; everything from the cut to the leaf is retained. - Check whether the cut falls inside a turn (split-turn case).
- Collect messages to summarize and, if split-turn, the turn-prefix messages.
- Build a file-operations diff from previous compaction forward (tracks which files were read or modified for the compaction summary footer).
The CompactionPreparation carries:
@dataclass
class CompactionPreparation:
retained_from_id: str # first entry the LLM still sees in full
messages_to_summarize: list[...] # messages that become the summary
turn_prefix_messages: list[...] # split-turn prefix (if any)
is_split_turn: bool
tokens_before: int
previous_summary: str | None # from the last compaction entry
file_ops: dict # file read/write tracking
settings: CompactionSettingsCompaction.compact(preparation, custom_instructions) calls the LLM to produce the summary.
Normal case: One LLM call with the full message history to summarize. The prompt combines the previous summary (if any) with the new messages using UPDATE_SUMMARIZATION_PROMPT or SUMMARIZATION_PROMPT.
Split-turn case: The cut point falls inside an incomplete assistant turn (the LLM was mid-response when the context limit hit). Two concurrent LLM calls run:
- Summarize the history before the split turn (or skip if empty).
- Summarize just the turn-prefix messages to produce a "Turn Context" block.
The results are concatenated with a --- divider.
After summarization, a file-operations footer is appended:
- Files read during the compacted period.
- Files modified (created, updated, deleted) during the compacted period.
The final CompactionResult:
@dataclass
class CompactionResult:
summary: str # the LLM-generated summary + file footer
retained_from_id: str # entry ID where the retained tail begins
tokens_before: int # context size before compaction
details: CompactionDetails # read_files, modified_files listsAfter compact() returns, Agent calls:
self._session_manager.append_compaction(
summary=compaction_result.summary,
first_kept_entry_id=compaction_result.retained_from_id,
tokens_before=compaction_result.tokens_before,
details=compaction_result.details,
)
self._refresh_context_tokens_from_session()refresh_context_tokens_from_session() re-estimates the token count from the session's reconstructed message list rather than trusting the historical input_tokens from the last assistant message. Pre-compaction assistant messages still carry their original (larger) provider usage — using those would give a stale, inflated token count.
Extensions can intercept compaction via session_before_compact:
@hooks.on('session_before_compact')
async def on_before_compact(event, ctx):
return SessionBeforeCompactResult(cancel=True) # skip compaction
# or:
return SessionBeforeCompactResult(compaction=my_result) # supply custom summaryIf any handler returns cancel=True, compaction is skipped and Agent returns False from _run_compaction(). If any handler supplies a CompactionResult, the LLM call is skipped and that result is used directly.
After compaction, session_compact fires with the newly appended CompactionEntry.
@dataclass
class CompactionSettings:
enabled: bool = True
reserve_tokens: int = 8192 # headroom to keep for the next turn
keep_recent_tokens: int = 16384 # tokens to retain uncompressed after the cutSettings are read live via a provider callback so changes to settings.json take effect on the next compaction check without restarting.
See session.md — specifically the build_session_context() section — for how the LLM receives the compacted context on subsequent turns.
- agent.md — Compaction scheduling inside
invoke() - session.md — How
CompactionEntryaffects context reconstruction - hooks.md —
session_before_compactandsession_compactevents