Agent is the orchestration layer above the low-level Engine. It owns session persistence, retry logic, extension event dispatch, compaction scheduling, and the interface through which extensions interact with the running session.
This document describes the current behavior and the design intent behind the wiring decisions.
invoke(user_input)
└─ _run_with_retry(ctx, user_entry_id)
└─ engine.run(ctx) ← Engine streams LLM events
│
├─ _on_engine_event ← re-dispatches to ExtensionRuntime
├─ _before_tool_call ← emits 'tool_call', can block
├─ _after_tool_call ← emits 'tool_result', can patch
└─ _get_ephemeral_messages ← injects live desktop/browser state per turn
Agent tracks phase as an AgentPhase enum:
class AgentPhase(str, Enum):
IDLE = "idle"
TURN = "turn"
COMPACTION = "compaction"Structural operations require phase == AgentPhase.IDLE:
invoke()run_compaction()
Both set the phase synchronously before the first await and restore it in a finally block.
Calling either while the agent is busy raises RuntimeError immediately.
invoke() follows this sequence:
1. Assert phase == AgentPhase.IDLE
│
2. Call guardrail.on_turn_start() on all guardrails (reset per-turn state)
│
3. Emit 'input' event
├─ Extensions see raw user text
│
3. Rebuild system prompt (or hit per-channel cache)
├─ Load skills, knowledge
├─ Apply AGENT.md + SYSTEM.md
│
4. Emit 'before_agent_start'
├─ Extensions may replace system prompt
│
5. Reconstruct message history
├─ Load session tree from disk
├─ Compute leaf-to-root path
├─ Apply any compaction summaries
│
6. Emit 'context'
├─ Extensions may replace/filter messages
│
7. Persist user message
├─ Appended once (not per retry)
│
8. Merge tools
├─ Builtin tools override extensions
│
9. Enter _run_with_retry()
├─ Loop: attempt Engine.run()
├─ On error: rewind + exponential backoff
├─ On success: break
│
10. After success
├─ Emit 'save_point'
├─ Persist AssistantMessage to disk
├─ Emit 'agent_end'
│
11. Check compaction
├─ If context_tokens > window - reserve_tokens
├─ Run compaction, emit 'after_compaction'
│
12. Check for queued follow-ups
└─ If none: emit 'settled'
Key invariants:
- Session writes are deferred until
AssistantMessage(turn succeeded) - Failed retries fully rewind before next attempt
- User message is persisted once before retry loop (can be removed if all retries fail)
- Compaction only runs after
save_point(turn complete, session durable) - Guardrails run on every tool call within a turn; errors from guardrails never abort a turn (the decision itself is the signal)
_run_with_retry() implements exponential-backoff retry on Engine errors.
for attempt in range(max_retries + 1):
unsubscribe = self._register_message_handler(persisted_ids)
try:
await self._engine.run(ctx)
finally:
unsubscribe()
error = self._engine.state.error_message
if error is None:
return # success
self._rewind_session(persisted_ids)
self._engine.reset()
await asyncio.sleep(base_delay_s * 2 ** (attempt - 1))Every engine error is classified by classify_error(exc) -> ClassifiedError before
deciding whether to retry:
ErrorKind |
HTTP | Retryable | Notes |
|---|---|---|---|
RATE_LIMIT |
429 | yes | Backoff + retry |
OVERLOADED |
503 / 529 | yes | Backoff + retry |
SERVER_ERROR |
500 / 502 | yes | Retry |
TIMEOUT |
— | yes | Retry |
CONTEXT_OVERFLOW |
400 / 413 | yes | Run compaction first, then retry |
UNKNOWN |
— | yes | Retry with backoff |
AUTH |
401 / 403 | no | Abort immediately |
AUTH_PERMANENT |
401 / 403 | no | Abort immediately |
BILLING |
402 / 429 | no | Abort immediately |
MODEL_NOT_FOUND |
404 | no | Abort immediately |
CONTENT_BLOCKED |
— | no | Abort immediately |
FORMAT_ERROR |
400 | no | Abort immediately |
CONTEXT_OVERFLOW triggers an automatic compaction pass before the next retry
attempt, so the context is within budget when the LLM is called again.
Example walkthrough:
Session state (before invoke):
leaf_id = "msg-2"
[msg-0] User: "fix the auth bug"
[msg-1] Assistant: "I'll help..."
[msg-2] Tool result: {...}
User input: "continue working"
Attempt 0:
- Persist user msg-3:
[..., msg-3-user] - Engine runs, LLM returns assistant msg (pending)
- Engine fails with timeout
- Rewind: remove unpersisted messages (LLM response)
- Reset Engine, wait 2s
Attempt 1:
- Engine runs again, succeeds
- Persist assistant msg-4 to disk
- Return to invoke()
Final session on disk:
[msg-0] User: "fix the auth bug"
[msg-1] Assistant: "I'll help..."
[msg-2] Tool result: {...}
[msg-3] User: "continue working"
[msg-4] Assistant: "resuming..."
If all retries exhaust, the user message (msg-3) is also rewound and removed.
Configuration:
- Disable:
config.retry_enabled = False - Max attempts:
config.retry_config.max_retries = 5(default: 3) - Delay schedule:
base_delay_ms * 2^(attempt-1)— pure exponential backoff, no jitter
Messages are persisted in the _register_message_handler hook, registered once per attempt:
AssistantMessage: persisted onmessage_end; token usage is extracted and_context_tokensis updated.ToolMessage: persisted onmessage_end.
The handler is unregistered in a finally block so it is not called after the attempt ends, regardless of success or error. persisted_ids accumulates entry IDs for rollback.
Engine fires low-level events through options.on_event. Agent intercepts every event and re-dispatches it to each loaded extension's handlers via _on_engine_event():
async def _on_engine_event(self, event):
event_type = getattr(event, 'type', None)
for ext in self._extensions._extensions:
for handler in ext.handlers.get(event_type, []):
await handler(event, self)Extension errors are caught and appended to self._extensions._errors. They do not abort the active turn.
Agent sets four callbacks on Engine.options at construction time:
options.before_tool_call = self._before_tool_calloptions.after_tool_call = self._after_tool_calloptions.on_event = self._on_engine_eventoptions.get_ephemeral_messages = self._get_ephemeral_messages
before_tool_call: Emits tool_call to extensions. If any handler returns ToolCallEventResult(block=True), the call is short-circuited and a ToolResultContent(is_error=True) is returned to the Engine. The LLM sees this as a tool error.
after_tool_call: Emits tool_result to extensions. Handlers may return ToolResultEventResult to patch content, is_error, or set terminate=True. Patches accumulate — each handler sees the previous patched state. Setting terminate=True signals the Engine to skip the follow-up LLM call if every tool in the batch terminates.
get_ephemeral_messages: Called by the Engine at the start of each turn before the LLM call. Agent inspects tool_context.desktop and tool_context.browser — if either is open, get_state() is called and the result is wrapped in a UserMessage (with an image if use_screenshot=True). These messages are appended to ctx_messages for that LLM call only and are never written to session history. Controlled by ComputerUseSettings and BrowserUseSettings baked into the desktop/browser instances at runtime startup:
| Flag | Effect |
|---|---|
use_screenshot |
Screenshot is captured and embedded as an image in the message |
use_accessibility |
DOM/accessibility tree is captured and included as text |
| (always) | Basic app/window/tab info is always present when the tool is open |
_rebuild_system_prompt() assembles the system prompt in this order:
- Identity —
SYSTEM.md(replaces everything) →SOUL.md(replaces default persona) →"You are a helpful assistant."+ guidelines. - Docs reference — path to the
docs/directory with a per-topic file map; injected only if the directory exists. Agent reads lazily when asked about Operator internals. - Append section —
append_system_promptcontent (includes knowledge injection — see knowledge.md). - Memory — contents of
MEMORY.mdif present. - User profile — contents of
USER.mdif present. - Skills —
<available_skills>XML block with enforcement rules; only whenreadorskilltool is available. Each entry includes<location>with the absolute path toSKILL.md. - Platform hint — channel-specific formatting rules (Telegram, Discord, Slack, etc.).
- Footer — always appended last:
When a profile is active, the footer expands to include:
Current date: <ISO date> Current working directory: <cwd> Global directory: ~/.operatorProfile directory: ~/.operator/profiles/<name> Profile SOUL.md: ~/.operator/profiles/<name>/SOUL.md ← persona and identity Profile MEMORY.md: ~/.operator/profiles/<name>/MEMORY.md ← long-term memory Profile USER.md: ~/.operator/profiles/<name>/USER.md ← who you are talking to Profile skills directory: ~/.operator/profiles/<name>/skills Profile knowledge directory: ~/.operator/profiles/<name>/knowledge Profile temp directory: ~/.operator/profiles/<name>/temp ← scratchpad / terminal CWD Session ID: <id> ← only when a session exists
When a custom prompt (SYSTEM.md) is set it replaces the identity layer only — docs, memory, user, skills, platform, and footer are still appended.
After before_agent_start, extensions may replace the entire system prompt string.
Compaction runs after agent_end (the turn is complete and session writes are durable):
if self._compact_requested or self._compaction.should_compact(
self._context_tokens, self._context_window
):
await self._run_compaction(opts.compaction_custom_instructions)Extensions may call ctx.compact() during a turn to set _compact_requested. The actual compaction always runs after the turn, never mid-turn.
run_compaction() is also a public API for manual compaction. It checks phase == AgentPhase.IDLE and fails fast if called while a turn is running. After a successful manual compaction it emits save_point and optionally settled.
See compaction.md for the Compaction implementation.
Agent exposes new_session(), fork(), and switch_session() — all forwarded to the Runtime if one is wired up. These are structural operations that require the agent to be idle. They do not go through the retry machinery.
Agent implements ExtensionContext, which is the object passed to every extension event handler. Extensions see:
cwd— the working directorysession_manager— the liveSessionManagermodel/model_registry— current model and the full registrysignal— the Engine's abort signalis_idle()/has_pending_messages()abort()/shutdown()get_context_usage()— token count relative to context windowget_system_prompt()— the most recently built system promptcompact()— request compaction after the current turnrun_compaction()— run compaction now (must be idle)reload()— reload resource fileswait_for_idle()— await Engine settlementnew_session()/fork()/switch_session()— session navigation
| Event | When |
|---|---|
input |
Before processing starts — raw user text |
before_agent_start |
After system prompt is built, before Engine run |
context |
After session is loaded, before AgentContext snapshot |
agent_end |
After all Engine turns complete and session is written |
save_point |
After agent_end — session writes are durable |
settled |
After save_point when no queued follow-up turns remain |
session_before_compact |
Before compaction runs — can cancel or replace |
session_compact |
After compaction entry is appended |
retry_start |
Before each retry attempt (delay not yet applied) |
retry_end |
After each attempt completes, with success flag |
Engine events (agent_start, turn_start, message_end, tool_execution_*, …) are re-dispatched through _on_engine_event. See engine.md for the full Engine event list.
agent.spawn_child(tools=None) -> Agent creates an ephemeral child agent that
shares the parent's LLM instance and system prompt. Used by background reviewers.
child = agent.spawn_child(tools=['skill'])What the child inherits:
- Same
Engine.llminstance (same provider credentials and base URL) - Same
_system_promptand full_system_prompt_cache(byte-identical → prefix cache hit) - All parent tools in the request body (same provider cache key)
session_idpinned to the parent's (so any prompt rebuild still produces identical bytes)mcp_managerandsettings_managerfrom the parent's tool context
What the child does not get:
- No session persistence (
persist=False) - No extensions
- No compaction (
NullCompaction) - No desktop/browser tool context (intentional — children must not control parent's automation sessions)
- Memory manager is only forwarded if
'memory'is in the whitelist
Tool whitelist: When tools is a list, only those tools may dispatch at runtime.
All parent tools are still sent in the request body so the provider cache key is
unchanged. Non-whitelisted calls return ToolResultContent(is_error=True) immediately.
Agent maintains a list of Guardrail instances in self._guardrails. They are
loaded from files and merged with extension-registered guardrails on every reload().
reload()
└─ _refresh_guardrails()
├─ file_guardrails = resources.get_guardrails() # builtin + profile + project
└─ ext_guardrails = extensions.get_guardrails() # extension-registered
(file names win on collision)
Turn lifecycle:
on_turn_start()called on all guardrails at the start ofinvoke()to reset per-turn counters.before_call()runs inside_before_tool_callafter extensiontool_callhandlers. Ablockdecision returns a synthetic error result;haltsends the Engine abort signal.after_call()runs inside_after_tool_callafter extensiontool_resulthandlers. Awarndecision appends the reason to the result content;haltaborts the Engine.
See guardrails.md for the full interface, loading rules, and examples.
The system prompt is rebuilt once per invoke() call and cached per channel:
self._system_prompt_cache: dict[str | None, str]Key is opts.channel (the incoming channel ID, or None for the REPL). On cache
hit the rebuild is skipped entirely. The cache is cleared:
- On
reload()(resources changed) - When the memory review background thread completes (MEMORY.md may have been updated)
Child agents inherit the full parent cache via child._system_prompt_cache.update(...),
so any channel-specific variants are also warm for the child.
- engine.md — Engine loop and tool execution
- session.md — Session persistence, tree navigation
- hooks.md — Hooks system event types and result semantics
- extensions.md — Extension loading and dispatch
- compaction.md — Compaction logic
- guardrails.md — Guardrail interface, loading, built-ins