diff --git a/docs/prd.md b/docs/prd.md index ed81cf3..235b5bd 100644 --- a/docs/prd.md +++ b/docs/prd.md @@ -3999,10 +3999,10 @@ Notation: `[x]` shipped, `[ ]` planned. Mirrors the README's roadmap so contribu - [ ] Typed hook payloads: `onSkillStart`, `onToolCall`, `onToolResult` (§8.4) - [ ] Typed memory strategies: `sliding`, `tokenBudget`, `summarized` namespaces (§8.5) - [ ] Human-in-the-loop: `confirm()` with message templates, timeouts, fallback behavior (§9.2.1) -- [ ] Session model — multi-turn `AgentSession`, automatic compaction (`SUMMARIZE`, `SLIDING_WINDOW`, `CUSTOM`) (§5.7) +- [x] Session model — multi-turn `AgentSession` **shipped** (#1736; `events` Flow + `await()` + snapshot/resume). _Remaining:_ automatic compaction (`SUMMARIZE` / `SLIDING_WINDOW` / `CUSTOM`) (§5.7) - [ ] Reactive context hooks: `beforeInference`, `afterToolCall` — context-mutating hooks that inject system reminders (§8.4) - [ ] `.spawn {}` — independent sub-agent lifecycle, `AgentHandle`, parent-managed join -- [ ] `Flow` for reactive UIs + Pipeline-level events (`StageStarted`, `PipelineCompleted`, etc) — depends on streaming, sub-agents, sessions (§10.2) +- [x] Reactive event stream for UIs **shipped** — `AgentSession.events: Flow` (#1736) + `agent.observe { }` (#965). _Remaining:_ composition-stage event types (`StageStarted`, `PipelineCompleted`) at the Pipeline level (§10.2) - [ ] Serialization — `agent.json`, A2A AgentCard - [ ] JAR bundles and folder-based assembly - [ ] Gradle plugin diff --git a/docs/roadmap.md b/docs/roadmap.md index 11f186b..b17cf83 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -70,7 +70,7 @@ The 0.6.0 epic ([#1911](../../issues/1911)) tracks the full acceptance criteria. - [ ] jlink minimal JRE bundle for runtime (~35MB) *Secondary:* -- [ ] Session model — multi-turn `AgentSession`, automatic compaction (`SUMMARIZE`, `SLIDING_WINDOW`, `CUSTOM`) +- [x] Session model — multi-turn `AgentSession` **shipped** (cold `events: Flow` + `await()` + snapshot/resume; #1736 — see "Streaming session surface" below). _Remaining:_ automatic compaction (`SUMMARIZE` / `SLIDING_WINDOW` / `CUSTOM`). - [x] **`onBefore*` interceptor family** — Rails-style `onBeforeSkill` / `onBeforeToolCall` / `onBeforeTurn` returning a sealed `Decision { Proceed | ProceedWith(args) | Deny(reason) | Substitute(result) }`. Sibling to today's post-hoc observer hooks (`onToolUse` / `onSkillChosen` / `onError`). Unifies per-client tool policy (McpServer), action confirmation, prompt-injection filtering (one-liner: `onBeforeTurn { msgs -> if (filter.flag(msgs)) Decision.Deny(...) else Decision.Proceed }`), and uniform `perToolTimeout` wrapping. Chain semantics: registration order, all run, first non-`Proceed` wins. ([#1907](../../issues/1907), feeds [#1908](../../issues/1908)) - [x] Agent memory — `MemoryBank`, `memory_read`/`memory_write`/`memory_search` auto-injected tools - [ ] `.spawn {}` — independent sub-agent lifecycle, `AgentHandle`, parent-managed join @@ -80,10 +80,10 @@ The 0.6.0 epic ([#1911](../../issues/1911)) tracks the full acceptance criteria. - [x] **Enforce `perToolTimeout` on session-aware tool path** — `sessionExecutor` calls now respect `budget.perToolTimeout`, emit failed `ToolCallFinished` events on timeout, and surface `BudgetExceededException(PER_TOOL_TIMEOUT)`. ([#1903](../../issues/1903)) - [x] **Streaming docs reconcile** — README Limitations / Roadmap bullets are tagged as shipped / experimental / planned; the stale "no per-adapter native streaming yet" wording is gone, and DeepSeek is called out as using the OpenAI-compatible SSE path. ([#1901](../../issues/1901)) - [x] Per-adapter native streaming overrides — Anthropic SSE (`ClaudeClient.chatStream`), OpenAI SSE (`OpenAiClient.chatStream`), Ollama NDJSON `stream: true` (`OllamaClient.chatStream`) all emit real partial chunks at the wire. Live integration tests measure 19 / 2 / 19 chunks per response respectively. See [v0.5.0 streaming premortem](premortem-0.5.0-streaming.md) -- [ ] `Flow` for reactive UIs + Pipeline-level events (`StageStarted`, `PipelineCompleted`, etc) — built on top of `LlmChunk`; depends on sub-agents and sessions -- [ ] **Multimodal input** — vision and audio content blocks on LLM messages. - - **Image input:** vision-capable adapters accept image bytes + media type as a content block alongside text. Targets: Anthropic (`image` content blocks), OpenAI (`image_url` / base64 in content), Ollama (`llava` / `bakllava` via `images` field), Google Gemini. - - **Audio input:** true audio input (Gemini, GPT-4o-audio) — `LlmContent.Audio` block. Optional STT-only helper `audio.transcribe(file)` for the Whisper-style use case. +- [x] Reactive event stream for UIs — **shipped**: `AgentSession.events: Flow` (#1736) + `agent.observe { (PipelineEvent) -> }` (#965); a UI consumes the typed agent stream today (`Token` / `ToolCall*` / `SkillStarted` / `SkillCompleted` / `Completed` / `Failed`). _Remaining:_ composition-stage event types (`StageStarted`, `PipelineCompleted`) at the Pipeline level. +- [x] **Multimodal input — image/document (vision): SHIPPED** end-to-end across **Anthropic / OpenAI / Ollama** via `Content.Image` → `ImagePart` → provider wire and `agent.invokeWithAttachments` (#2466–#2470). The `Content` sealed type (`Text / Image / Audio / Video / Document`, each with a typed `ContentRef` + closed mime) is in place. _Remaining:_ audio/video input (below) + a Gemini provider to extend vision to. + - **Image/document input — [x] shipped** for Anthropic / OpenAI / Ollama (image bytes + media type as a content block alongside text; Gemini pending — no provider yet). + - **Audio/video input — [ ] remaining:** `Content.Audio` / `Content.Video` variants are typed but not yet sent to providers (Gemini, GPT-4o-audio). Optional STT-only helper `audio.transcribe(file)` for the Whisper-style use case. - **Architectural change:** `LlmMessage.content: String` needs to evolve into a `List` sealed type (Text / Image / Audio blocks). Binary-compat risk: add a sibling `contentBlocks: List?` field first with the existing String form auto-coerced into a single Text block; deprecate the String form once the API surface settles. Typed boundaries are unaffected — `Agent` (image classifier) and `Agent` (transcriber) become coherent agent shapes. - [ ] Serialization — `agent.json`, A2A AgentCard - [ ] JAR bundles and folder-based assembly