feat(mcp-apps): stream partial tool input for progressive App rendering (SEP-1865)#417
Merged
Merged
Conversation
…ng (SEP-1865) MCP Apps that render progressively from streaming tool arguments — e.g. Excalidraw's guided camera tour, which animates the viewport to each `cameraUpdate` pseudo-element as it arrives — only worked on hosts that stream `ui/notifications/tool-input-partial` while the model generates the call. Our host mounted the App frame only AFTER `tool_result` and sent the complete `tool-input` once, so the App always took its static "snap to final viewport" path: the diagram drew but never toured. Close the gap end to end: - Backend: mount the frame EARLY at the tool's `content_block_start` (the resource shell is static per resourceUri, so `resources/read` needs only the tool name) so the App's bridge is live while args stream. Accumulate the streamed `toolUse.input` fragments per toolUseId, heal the partial JSON into a valid object (new `apis/shared/mcp_apps/partial_json.py`), and emit a new `ui_tool_input_partial` SSE per delta. Dedupe the early mount against the legacy post-`tool_result` path; persistence rides the shared emit helper. - Frontend: parse/validate/route `ui_tool_input_partial` → `McpAppStateService.recordPartialInput` (new partial-input signal). The bridge gains `sendToolInputPartial`/`sendToolInputFinal`; `pushToolData` now sends the complete `tool-input` only when the input is final (new optional deps `getPartialToolInput`/`isToolInputFinal`; absent ⇒ PR #4 final-only behavior, fully back-compat). The frame relays partials while streaming and the final input + result once complete. - Docs: CLAUDE.md SSE table — early-mount note on `ui_resource` + new `ui_tool_input_partial` row. Tests: heal_partial_json units (string/array/object closure, dangling key/sep, nested, embedded-JSON-string elements); coordinator early-mount + dedupe + partial-emit + healing; frontend validator/routing, state service partial-input, bridge partial→final ordering + late-partial guard. Backend sweep + frontend mcp-apps/stream-parser/component specs green; tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tool_result
Keying frame finality on the tool result made the App wait for the tool to
execute before it received the complete `tool-input` — so the last streamed
element (the partial path drops a possibly-incomplete tail) only appeared once
the result landed. The stream parser already distinguishes the states: an
in-flight tool-use block carries `input: {}` (its accumulating JSON can't
parse), and the parsed object appears only when the block finalizes at
`content_block_stop`. Key `inputFinal` on a non-empty `lookupToolInput()` so
the App gets the complete, fully-rendered `tool-input` the moment arguments
finish streaming; keep the result as a fallback for empty-input tools and the
reload path. tsc + mcp-apps/tool-use/assistant-message specs green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t (blank canvas)
Root-caused live (Chrome MCP against the local app) the long-standing "blank
iframe": the App shell loaded fine and the bridge handshake completed, but the
Excalidraw canvas drew nothing because it received an EMPTY `tool-input`.
The frame's `getToolInput()` read only `lookupToolInput()` (the live stream
parser's `allMessages()`), which is empty for an MCP App frame by the time the
frame relays the final input — the parsed tool-use block isn't in the live
stream anymore once the turn finishes, and the mount is often deferred further
behind the capability-consent prompt (the App declares `clipboardWrite`, so the
iframe is held until the user clicks Allow — well after the turn completed).
Result: `ontoolinput({})` → no `elements` → blank. This predates the streaming
work; the App had always been getting empty input from the stream-parser path.
The streamed partial-input feature already captures the complete arguments in
`McpAppStateService` (verified: stream parser empty, partial holds the full
~5.6KB `elements` string). Resolve the final `tool-input` from the parser when
present, else fall back to that captured partial (`resolvedToolInput()`).
`inputFinal` stays keyed on stream completion so partials still drive the live
tour. Verified live: the diagram now renders (User / AI Agent / App actors +
flow arrows) where it was blank.
tsc clean; mcp-apps + tool-use specs green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ty consent prompt The PR #6 render-time capability-consent gate prompted the user (camera / mic / geo / clipboard) and HELD the iframe mount until answered. For Excalidraw that surfaced a "This app requests clipboard — Allow" prompt the SEP-1865 reference host (and Claude) never shows, and — because it deferred the mount past the argument-streaming window — it also blocked progressive rendering. Match the reference: map the App's declared `_meta.ui.permissions` straight onto the iframe `allow` (Permissions-Policy) and mount immediately. Delegating a feature via `allow` does not activate it — the browser still prompts at use-time for camera/microphone/geolocation, and clipboard-write is low-risk and needs no prompt. `ui/open-link` still routes through `McpAppConsentService`. Removes `requestedCaps`/`capabilityGrant`/`capabilitiesResolved` and the consent effect; `effectivePermissions` is now just the declared permissions; `proxyUrl` no longer waits on consent. KNOWN FOLLOW-UP: mounting early (during streaming) exposes a separate issue — this App renders blank when it receives the partial tool-input stream then the final, while final-only input renders fine. Tracked for dedicated debugging (see memory project_mcp_apps_streaming_tool_input_gap); not addressed here. tsc app+spec clean; mcp-apps + consent-prompt + tool-use specs green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Excalidraw App's progressive camera tour flashed all at once: the streamed `ui_tool_input_partial` events reached the SPA in a ~1s burst ~8–10s after the tool's content_block_start, instead of spread over the ~10s the model takes to generate the args. Root cause is Bedrock, not our pipeline. Localized with per-event yield timing, a concurrent heartbeat, and a raw-chunk probe inside Strands' BedrockModel._stream: every backend SSE yield-block was <13ms (zero consumer backpressure), the event loop ticked steadily through the whole stall (no sync blocker — rules out hooks / session_manager / SequentialToolExecutor), and the raw boto3 chunks confirmed it — after the tool's contentBlockStart + first empty toolUse delta, Bedrock delivers nothing for ~8–10s then dumps all ~860–1100 input_json_delta chunks in one ~1s burst. Reproduced offline with a raw converse_stream call; the trigger is Anthropic's default tool-use behavior (input JSON is buffered for schema validation, deltas flushed only when the block completes). Not model-specific (Haiku and Sonnet both burst), not maxTokens, not caching. Fix: enable Anthropic's fine-grained tool streaming beta (fine-grained-tool-streaming-2025-05-14) in ModelConfig.to_bedrock_config via the existing additional_request_fields -> additionalModelRequestFields path, scoped to Bedrock Claude models and gated on the MCP Apps host flag (opted-out envs keep Anthropic's JSON-validated tool input), merging into any existing thinking/top_k block. Offline replay: without -> 10.3s silence + 1s burst; with -> deltas spread evenly over ~8s. Verified live: the backend now emits ~520 partials over ~11s (~43/s) and the diagram builds progressively on screen. Frontend: remove the host-side 200ms pacer that reconstructed a tour from the burst — now counterproductive, as it would decouple the tour from the real stream. Partials relay directly as they arrive, gated on bridge.viewIsInitialized, with pre-handshake catch-up handled by the getPartialToolInput init seed (avoids a preInitQueue re-burst). The inputComplete blank-canvas fix (finality keyed on the real tool result, not the success stub) is retained. Tests: 4 new model_config cases for the beta (Claude/non-Claude, flag on/off, thinking-merge); full backend suite green; frontend bridge/state/ assistant-message specs green; tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On reload the MCP App frame mounted (the UI resource is persisted and
rehydrated via the GET /messages uiResources sidecar) but rendered a blank
canvas: resolvedToolInput() only consulted the live stream parser and the
live-only captured partial, both empty after a refresh, so the frame sent
an empty tool-input. The arguments were actually available — GET /messages
returns the persisted toolUse.input and message-map preserves it
(...toolUse) — they just weren't forwarded to the frame.
Thread the persisted toolUse.input through the mcp_app_frame block into a
new toolInput frame input, and use it as the final fallback in
resolvedToolInput (live parser -> captured partial -> persisted input).
Also key inputFinal on a non-empty persisted input so an interrupted tool
(input persisted, no result) still renders. Inert on the live path: the
stream parser leaves an in-flight block's input empty until
content_block_stop, so toolInput is {} during streaming and never
pre-empts the tour. No tour is replayed on reload — the App snaps to the
complete input (the spec's tool-input final path).
Frontend-only, no backend change. Verified live: loading a persisted
Excalidraw session now renders the full diagram (was blank); the live tour
still animates progressively and completes. tsc clean; assistant-message /
bridge / state specs green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Render the MCP App frame with a connected header bar — server icon, server name, and tool name (shimmering while the tool runs) — replacing the separate minimized tool card, and stop the tool rail from flashing before the frame mounts. Backend: - Capture `serverInfo` (name/title/icons) off the MCP `initialize` in `_UIExtensionClientSession` (neither the SDK session nor Strands retains it). - `ui_resource` now carries `serverName` (serverInfo.title→name→`ui://` authority), `icon` (serverInfo icons; empty→glyph), and `toolName`. - Two-phase emit at `content_block_start`: an instant header-only shell (no `resources/read`) then the full html-bearing resource, so the header shows immediately. Persist the new fields for reload survival. Frontend: - Fold the header into `mcp-app-frame` (icon w/ glyph fallback, server·tool, `</>` request/response toggle); drop the redundant minimized card for MCP Apps. - Gate the iframe mount on a non-empty html with a loading skeleton. - Source the tool name from the `ui_resource` so the name + shimmer appear atomically with the frame's promotion. - Fix the shimmer CSS: it built the gradient from `currentColor` while setting `color: transparent`, painting the clipped text invisible the whole time the tool ran; use explicit gray tones (light + dark). Docs: CLAUDE.md SSE table updated for the enriched `ui_resource` payload. Backend 3176 passed; frontend 1131 passed; tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…st (SEP-1865) The MCP runtime protocol carries no icon — Excalidraw's `serverInfo`/tools expose none, and its logo lives only in the MCPB bundle `manifest.json` (`"icon": "docs/logo.png"`), which Claude inlines from the *installed bundle* as a data: URI. But many deployable MCP-App servers also serve that manifest + icon over HTTP at their origin (verified: https://mcp.excalidraw.com/manifest.json + /docs/logo.png), so resolve it server-side and base64-inline it onto the `ui_resource` `icon` — mirroring what Claude does, with no admin config. - `resolve_server_icon` / `get_cached_server_icon` (mcp_apps.py): fetch `<origin>/manifest.json`, resolve its `icon` SAME-ORIGIN only (bounds SSRF to the admin-trusted MCP origin; foreign icon URLs refused), GET the image (5s timeout, 256KB cap, image/* only), base64-inline as a data: URI. Cached per origin (incl. "" on miss), pre-warmed at `tools/list` so request-path resolvers stay cache lookups. `serverInfo.icons` still take precedence. - Plumb `server_url` onto `UICapableMCPClient` (Strands only gets a transport callable) so the origin is derivable; passed from external_mcp_client. - Size-gate persistence: a large data: URI icon is not stored (protects the 400KB DynamoDB item limit alongside gzipped HTML), so it shows live but reloads to the generic glyph. - Docs: CLAUDE.md `ui_resource` row updated for the served-manifest icon path. Backend 3182 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
MCP Apps that render progressively from streaming tool arguments — e.g. the official
excalidraw/excalidraw-mcpguided camera tour, which animates the viewport to eachcameraUpdatepseudo-element as it streams in — render their diagram but never tour in our host. The model narrates "progressive camera movements to guide you through each stage," but nothing moves.Root cause (not CSP, not the gzip/DynamoDB persistence work): the camera tour is driven by
ui/notifications/tool-input-partial(→ ext-apps SDKontoolinputpartial), which a host streams while the model generates the tool arguments. Our host (a) mounted the App frame only aftertool_result— the entire argument-streaming window was already over — and (b) sent the completetool-inputexactly once. So the App only ever took itsontoolinput(final) path → static "snap to final viewport." TheM_TOOL_INPUT_PARTIALconstant existed but was dead code.What this does
Streams a UI tool's arguments to the App as the model generates them, end to end:
ui_resourcenow emits at the tool'scontent_block_start(the resource shell is static perresourceUri, soresources/readneeds only the tool name), so the App's bridge is live during argument streaming. Deduped against the legacy post-tool_resultfallback.toolUse.inputfragments pertoolUseId, server-side "heal" the partial JSON into a valid object (newapis/shared/mcp_apps/partial_json.py), and emit a newui_tool_input_partialSSE per delta → relayed to the App asui/notifications/tool-input-partial.tool-inputis sent the moment arguments finish streaming (content_block_stop, detected via the parsedlookupToolInput()), not when the tool result lands — so the full diagram (incl. the last element) renders without lag.tool_resultremains a fallback (empty-input tools, reload path).Key files
apis/shared/mcp_apps/partial_json.py(new);agents/main_agent/streaming/stream_coordinator.py(_emit_ui_resource_for_toolearly mount + dedupe,_emit_tool_input_partial).shared/utils/stream-parser/*(type + validator +onToolInputPartial);mcp-app-state.service.ts(partial-input signal);mcp-app-bridge.ts(sendToolInputPartial/sendToolInputFinal, gatedpushToolData, back-compat);mcp-app-frame.component.ts(partial→final effects,inputFinal).ui_resourceearly-mount note +ui_tool_input_partialrow).Tests
heal_partial_jsonunits (string/array/object closure, dangling key/sep, nested, embedded-JSON-string elements); coordinator early-mount + dedupe + partial-emit + healing. Backend sweep (193) + healer (12) green; import boundaries intact.tscapp + spec clean.Draft status / follow-ups
mcp-app-frame.component.spec.tsfor theinputFinalcomputed (bridge-level ordering is covered; frame logic is typechecked + exercised indirectly).lookupToolInputis empty after refresh (pre-existing).🤖 Generated with Claude Code