Skip to content

feat(mcp-apps): stream partial tool input for progressive App rendering (SEP-1865)#417

Merged
philmerrell merged 8 commits into
developfrom
feature/mcp-apps-streaming-tool-input
Jun 1, 2026
Merged

feat(mcp-apps): stream partial tool input for progressive App rendering (SEP-1865)#417
philmerrell merged 8 commits into
developfrom
feature/mcp-apps-streaming-tool-input

Conversation

@philmerrell
Copy link
Copy Markdown
Contributor

Problem

MCP Apps that render progressively from streaming tool arguments — e.g. the official excalidraw/excalidraw-mcp guided camera tour, which animates the viewport to each cameraUpdate pseudo-element as it streams in — render their diagram but never tour in our host. The model narrates "progressive camera movements to guide you through each stage," but nothing moves.

Root cause (not CSP, not the gzip/DynamoDB persistence work): the camera tour is driven by ui/notifications/tool-input-partial (→ ext-apps SDK ontoolinputpartial), which a host streams while the model generates the tool arguments. Our host (a) mounted the App frame only after tool_result — the entire argument-streaming window was already over — and (b) sent the complete tool-input exactly once. So the App only ever took its ontoolinput (final) path → static "snap to final viewport." The M_TOOL_INPUT_PARTIAL constant existed but was dead code.

What this does

Streams a UI tool's arguments to the App as the model generates them, end to end:

  1. Early frame mountui_resource now emits at the tool's content_block_start (the resource shell is static per resourceUri, so resources/read needs only the tool name), so the App's bridge is live during argument streaming. Deduped against the legacy post-tool_result fallback.
  2. Partial input stream — accumulate the streamed toolUse.input fragments per toolUseId, server-side "heal" the partial JSON into a valid object (new apis/shared/mcp_apps/partial_json.py), and emit a new ui_tool_input_partial SSE per delta → relayed to the App as ui/notifications/tool-input-partial.
  3. Final on input-stream completion — the complete tool-input is sent the moment arguments finish streaming (content_block_stop, detected via the parsed lookupToolInput()), not when the tool result lands — so the full diagram (incl. the last element) renders without lag. tool_result remains a fallback (empty-input tools, reload path).

Key files

  • Backend: apis/shared/mcp_apps/partial_json.py (new); agents/main_agent/streaming/stream_coordinator.py (_emit_ui_resource_for_tool early mount + dedupe, _emit_tool_input_partial).
  • Frontend: shared/utils/stream-parser/* (type + validator + onToolInputPartial); mcp-app-state.service.ts (partial-input signal); mcp-app-bridge.ts (sendToolInputPartial/sendToolInputFinal, gated pushToolData, back-compat); mcp-app-frame.component.ts (partial→final effects, inputFinal).
  • Docs: CLAUDE.md SSE table (ui_resource early-mount note + ui_tool_input_partial row).

Tests

  • Backend: heal_partial_json units (string/array/object closure, dangling key/sep, nested, embedded-JSON-string elements); coordinator early-mount + dedupe + partial-emit + healing. Backend sweep (193) + healer (12) green; import boundaries intact.
  • Frontend: validator + routing; state-service partial input; bridge partial→final ordering + late-partial guard + pre-init queue. mcp-apps / stream-parser / component specs green; tsc app + spec clean.

Draft status / follow-ups

  • ⚠️ Not yet verified live against a deployed inference-api + the real Excalidraw MCP — the decisive check. Needs a deploy.
  • Per-delta emit is un-throttled (could coalesce for very chatty streams).
  • No dedicated mcp-app-frame.component.spec.ts for the inputFinal computed (bridge-level ordering is covered; frame logic is typechecked + exercised indirectly).
  • Reload path still has no partials (live-only by design) and lookupToolInput is empty after refresh (pre-existing).

🤖 Generated with Claude Code

philmerrell and others added 2 commits May 31, 2026 17:28
…ng (SEP-1865)

MCP Apps that render progressively from streaming tool arguments — e.g.
Excalidraw's guided camera tour, which animates the viewport to each
`cameraUpdate` pseudo-element as it arrives — only worked on hosts that
stream `ui/notifications/tool-input-partial` while the model generates the
call. Our host mounted the App frame only AFTER `tool_result` and sent the
complete `tool-input` once, so the App always took its static "snap to final
viewport" path: the diagram drew but never toured.

Close the gap end to end:

- Backend: mount the frame EARLY at the tool's `content_block_start` (the
  resource shell is static per resourceUri, so `resources/read` needs only
  the tool name) so the App's bridge is live while args stream. Accumulate
  the streamed `toolUse.input` fragments per toolUseId, heal the partial JSON
  into a valid object (new `apis/shared/mcp_apps/partial_json.py`), and emit a
  new `ui_tool_input_partial` SSE per delta. Dedupe the early mount against the
  legacy post-`tool_result` path; persistence rides the shared emit helper.

- Frontend: parse/validate/route `ui_tool_input_partial` →
  `McpAppStateService.recordPartialInput` (new partial-input signal). The
  bridge gains `sendToolInputPartial`/`sendToolInputFinal`; `pushToolData` now
  sends the complete `tool-input` only when the input is final (new optional
  deps `getPartialToolInput`/`isToolInputFinal`; absent ⇒ PR #4 final-only
  behavior, fully back-compat). The frame relays partials while streaming and
  the final input + result once complete.

- Docs: CLAUDE.md SSE table — early-mount note on `ui_resource` + new
  `ui_tool_input_partial` row.

Tests: heal_partial_json units (string/array/object closure, dangling
key/sep, nested, embedded-JSON-string elements); coordinator early-mount +
dedupe + partial-emit + healing; frontend validator/routing, state service
partial-input, bridge partial→final ordering + late-partial guard. Backend
sweep + frontend mcp-apps/stream-parser/component specs green; tsc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tool_result

Keying frame finality on the tool result made the App wait for the tool to
execute before it received the complete `tool-input` — so the last streamed
element (the partial path drops a possibly-incomplete tail) only appeared once
the result landed. The stream parser already distinguishes the states: an
in-flight tool-use block carries `input: {}` (its accumulating JSON can't
parse), and the parsed object appears only when the block finalizes at
`content_block_stop`. Key `inputFinal` on a non-empty `lookupToolInput()` so
the App gets the complete, fully-rendered `tool-input` the moment arguments
finish streaming; keep the result as a fallback for empty-input tools and the
reload path. tsc + mcp-apps/tool-use/assistant-message specs green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t (blank canvas)

Root-caused live (Chrome MCP against the local app) the long-standing "blank
iframe": the App shell loaded fine and the bridge handshake completed, but the
Excalidraw canvas drew nothing because it received an EMPTY `tool-input`.

The frame's `getToolInput()` read only `lookupToolInput()` (the live stream
parser's `allMessages()`), which is empty for an MCP App frame by the time the
frame relays the final input — the parsed tool-use block isn't in the live
stream anymore once the turn finishes, and the mount is often deferred further
behind the capability-consent prompt (the App declares `clipboardWrite`, so the
iframe is held until the user clicks Allow — well after the turn completed).
Result: `ontoolinput({})` → no `elements` → blank. This predates the streaming
work; the App had always been getting empty input from the stream-parser path.

The streamed partial-input feature already captures the complete arguments in
`McpAppStateService` (verified: stream parser empty, partial holds the full
~5.6KB `elements` string). Resolve the final `tool-input` from the parser when
present, else fall back to that captured partial (`resolvedToolInput()`).
`inputFinal` stays keyed on stream completion so partials still drive the live
tour. Verified live: the diagram now renders (User / AI Agent / App actors +
flow arrows) where it was blank.

tsc clean; mcp-apps + tool-use specs green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
philmerrell and others added 5 commits May 31, 2026 20:29
…ty consent prompt

The PR #6 render-time capability-consent gate prompted the user (camera / mic /
geo / clipboard) and HELD the iframe mount until answered. For Excalidraw that
surfaced a "This app requests clipboard — Allow" prompt the SEP-1865 reference
host (and Claude) never shows, and — because it deferred the mount past the
argument-streaming window — it also blocked progressive rendering.

Match the reference: map the App's declared `_meta.ui.permissions` straight
onto the iframe `allow` (Permissions-Policy) and mount immediately. Delegating
a feature via `allow` does not activate it — the browser still prompts at
use-time for camera/microphone/geolocation, and clipboard-write is low-risk and
needs no prompt. `ui/open-link` still routes through `McpAppConsentService`.
Removes `requestedCaps`/`capabilityGrant`/`capabilitiesResolved` and the consent
effect; `effectivePermissions` is now just the declared permissions; `proxyUrl`
no longer waits on consent.

KNOWN FOLLOW-UP: mounting early (during streaming) exposes a separate issue —
this App renders blank when it receives the partial tool-input stream then the
final, while final-only input renders fine. Tracked for dedicated debugging
(see memory project_mcp_apps_streaming_tool_input_gap); not addressed here.

tsc app+spec clean; mcp-apps + consent-prompt + tool-use specs green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Excalidraw App's progressive camera tour flashed all at once: the
streamed `ui_tool_input_partial` events reached the SPA in a ~1s burst
~8–10s after the tool's content_block_start, instead of spread over the
~10s the model takes to generate the args.

Root cause is Bedrock, not our pipeline. Localized with per-event yield
timing, a concurrent heartbeat, and a raw-chunk probe inside Strands'
BedrockModel._stream: every backend SSE yield-block was <13ms (zero
consumer backpressure), the event loop ticked steadily through the whole
stall (no sync blocker — rules out hooks / session_manager /
SequentialToolExecutor), and the raw boto3 chunks confirmed it — after
the tool's contentBlockStart + first empty toolUse delta, Bedrock
delivers nothing for ~8–10s then dumps all ~860–1100 input_json_delta
chunks in one ~1s burst. Reproduced offline with a raw converse_stream
call; the trigger is Anthropic's default tool-use behavior (input JSON is
buffered for schema validation, deltas flushed only when the block
completes). Not model-specific (Haiku and Sonnet both burst), not
maxTokens, not caching.

Fix: enable Anthropic's fine-grained tool streaming beta
(fine-grained-tool-streaming-2025-05-14) in ModelConfig.to_bedrock_config
via the existing additional_request_fields -> additionalModelRequestFields
path, scoped to Bedrock Claude models and gated on the MCP Apps host flag
(opted-out envs keep Anthropic's JSON-validated tool input), merging into
any existing thinking/top_k block. Offline replay: without -> 10.3s
silence + 1s burst; with -> deltas spread evenly over ~8s. Verified live:
the backend now emits ~520 partials over ~11s (~43/s) and the diagram
builds progressively on screen.

Frontend: remove the host-side 200ms pacer that reconstructed a tour from
the burst — now counterproductive, as it would decouple the tour from the
real stream. Partials relay directly as they arrive, gated on
bridge.viewIsInitialized, with pre-handshake catch-up handled by the
getPartialToolInput init seed (avoids a preInitQueue re-burst). The
inputComplete blank-canvas fix (finality keyed on the real tool result,
not the success stub) is retained.

Tests: 4 new model_config cases for the beta (Claude/non-Claude, flag
on/off, thinking-merge); full backend suite green; frontend bridge/state/
assistant-message specs green; tsc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On reload the MCP App frame mounted (the UI resource is persisted and
rehydrated via the GET /messages uiResources sidecar) but rendered a blank
canvas: resolvedToolInput() only consulted the live stream parser and the
live-only captured partial, both empty after a refresh, so the frame sent
an empty tool-input. The arguments were actually available — GET /messages
returns the persisted toolUse.input and message-map preserves it
(...toolUse) — they just weren't forwarded to the frame.

Thread the persisted toolUse.input through the mcp_app_frame block into a
new toolInput frame input, and use it as the final fallback in
resolvedToolInput (live parser -> captured partial -> persisted input).
Also key inputFinal on a non-empty persisted input so an interrupted tool
(input persisted, no result) still renders. Inert on the live path: the
stream parser leaves an in-flight block's input empty until
content_block_stop, so toolInput is {} during streaming and never
pre-empts the tour. No tour is replayed on reload — the App snaps to the
complete input (the spec's tool-input final path).

Frontend-only, no backend change. Verified live: loading a persisted
Excalidraw session now renders the full diagram (was blank); the live tour
still animates progressively and completes. tsc clean; assistant-message /
bridge / state specs green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Render the MCP App frame with a connected header bar — server icon, server
name, and tool name (shimmering while the tool runs) — replacing the separate
minimized tool card, and stop the tool rail from flashing before the frame
mounts.

Backend:
- Capture `serverInfo` (name/title/icons) off the MCP `initialize` in
  `_UIExtensionClientSession` (neither the SDK session nor Strands retains it).
- `ui_resource` now carries `serverName` (serverInfo.title→name→`ui://`
  authority), `icon` (serverInfo icons; empty→glyph), and `toolName`.
- Two-phase emit at `content_block_start`: an instant header-only shell
  (no `resources/read`) then the full html-bearing resource, so the header
  shows immediately. Persist the new fields for reload survival.

Frontend:
- Fold the header into `mcp-app-frame` (icon w/ glyph fallback, server·tool,
  `</>` request/response toggle); drop the redundant minimized card for MCP
  Apps.
- Gate the iframe mount on a non-empty html with a loading skeleton.
- Source the tool name from the `ui_resource` so the name + shimmer appear
  atomically with the frame's promotion.
- Fix the shimmer CSS: it built the gradient from `currentColor` while setting
  `color: transparent`, painting the clipped text invisible the whole time the
  tool ran; use explicit gray tones (light + dark).

Docs: CLAUDE.md SSE table updated for the enriched `ui_resource` payload.

Backend 3176 passed; frontend 1131 passed; tsc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…st (SEP-1865)

The MCP runtime protocol carries no icon — Excalidraw's `serverInfo`/tools
expose none, and its logo lives only in the MCPB bundle `manifest.json`
(`"icon": "docs/logo.png"`), which Claude inlines from the *installed bundle*
as a data: URI. But many deployable MCP-App servers also serve that manifest +
icon over HTTP at their origin (verified: https://mcp.excalidraw.com/manifest.json
+ /docs/logo.png), so resolve it server-side and base64-inline it onto the
`ui_resource` `icon` — mirroring what Claude does, with no admin config.

- `resolve_server_icon` / `get_cached_server_icon` (mcp_apps.py): fetch
  `<origin>/manifest.json`, resolve its `icon` SAME-ORIGIN only (bounds SSRF to
  the admin-trusted MCP origin; foreign icon URLs refused), GET the image (5s
  timeout, 256KB cap, image/* only), base64-inline as a data: URI. Cached per
  origin (incl. "" on miss), pre-warmed at `tools/list` so request-path
  resolvers stay cache lookups. `serverInfo.icons` still take precedence.
- Plumb `server_url` onto `UICapableMCPClient` (Strands only gets a transport
  callable) so the origin is derivable; passed from external_mcp_client.
- Size-gate persistence: a large data: URI icon is not stored (protects the
  400KB DynamoDB item limit alongside gzipped HTML), so it shows live but
  reloads to the generic glyph.
- Docs: CLAUDE.md `ui_resource` row updated for the served-manifest icon path.

Backend 3182 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philmerrell philmerrell marked this pull request as ready for review June 1, 2026 23:12
@philmerrell philmerrell merged commit 714c50c into develop Jun 1, 2026
30 checks passed
@philmerrell philmerrell deleted the feature/mcp-apps-streaming-tool-input branch June 1, 2026 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant