Skip to content

fix(agents): RPC calls no longer hang during connection churn#1746

Merged
threepointone merged 1 commit into
mainfrom
fix/rpc-calls-hang-1738
Jun 12, 2026
Merged

fix(agents): RPC calls no longer hang during connection churn#1746
threepointone merged 1 commit into
mainfrom
fix/rpc-calls-hang-1738

Conversation

@threepointone

@threepointone threepointone commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #1738.

usePartySocket replaces the underlying socket object whenever connection options change (async query refresh, enabled toggle, path change). The RPC layer didn't account for that, producing three ways for a call() promise to never settle:

  • a call issued against a stale agent reference (e.g. captured by a mount-time effect — the exact pattern in the issue) was buffered inside the permanently-closed old socket and never transmitted
  • a call transmitted just before replacement lost its response (RPC responses are connection-bound) with no rejection
  • the old socket's close event was unobservable after the swap, so "connection closed" cleanup never ran for it

useAgent

  • call(), agent.stub, and agent.setState route through a ref to the live socket, so stale references keep working after a replacement.
  • RPC requests are only handed to a socket once it's OPEN. Until then they're queued by the hook and flushed on the next open — including on a replacement socket. Queued requests were never transmitted, so flushing can't double-execute anything server-side.
  • Pending calls are tagged with the socket they were transmitted on. When a socket closes or is replaced, only calls sent on that socket reject with Connection closed; queued calls survive, and calls in flight on a newer socket are no longer spuriously rejected by a stale close event from an old one.
  • Destination guard: queued calls only follow the connection to the same agent instance. If the hook is re-pointed at a different address (the agent, name, basePath, or path props change) before a queued call was transmitted, the call rejects instead of executing against an instance it wasn't composed for. Credential-only changes (query refresh) still flush normally.
  • Default 30s timeout on non-streaming calls as a backstop so a genuinely lost response rejects instead of hanging. Configurable via the new defaultCallTimeout option (0 disables); an explicit per-call timeout always wins (timeout: 0 opts out). Streaming calls are exempt — long-lived streams are legitimate.

AgentClient

  • Pending calls track whether their request was actually transmitted. On a transient disconnect, transmitted calls reject (their response can never arrive) while buffered calls stay pending — PartySocket re-sends them on reconnect. A permanent close rejects everything.
  • Same defaultCallTimeout backstop.

Both now console.warn when an RPC response arrives with no matching pending call (e.g. after a timeout) instead of silently discarding it.

Upstream

Companion partysocket PR (cloudflare/partykit#403) fixes the underlying buffered-message loss and unobservable close at the library level — synchronous close events, send() returning transmit status, drainQueuedMessages(), and destination-guarded buffer transfer in the hooks. This PR is self-contained and doesn't depend on it: the hook-level queue here never relies on PartySocket's internal buffer. The agents react suite was also smoke-tested against the new partysocket build (96/96 passing), so the future dependency bump is safe.

Test plan

  • New rpc-robustness.test.tsx suite (9 tests): mount-time call resolves; stale agent reference works after socket replacement; queued calls flush onto a replacement socket; in-flight calls on a replaced socket reject promptly; destination guard rejects queued calls when the agent address changes; default timeout fires; timeout: 0 opts out; streaming calls exempt from the backstop; dropped-response warning
  • New AgentClient tests in client.test.ts (6 tests): transmitted vs. buffered rejection on transient disconnect, buffered calls surviving reconnect, default timeout, timeout: 0, streaming exemption, dropped-response warning
  • Verified the new tests reproduce the original hang without the fix (calls hang for 500+ seconds)
  • Full react test project passes (96 tests, 6 files); pnpm run check and nx affected -t test (9 projects) green

Changeset included (agents minor).

Made with Cursor


Open in Devin Review

usePartySocket replaces the underlying socket object whenever connection
options change (async query refresh, enabled toggle, path change). The
RPC layer didn't account for that:

- a call issued against a stale `agent` reference (e.g. captured by a
  mount-time effect) was buffered inside the permanently-closed old
  socket and its promise never settled
- a call transmitted just before replacement lost its response — RPC
  responses are connection-bound — with no rejection either
- the old socket's close event was unobservable after the swap, so
  "connection closed" cleanup never ran for it

useAgent:

- call(), agent.stub, and agent.setState now route through a ref to the
  live socket, so stale references keep working after a replacement.
- RPC requests are only handed to a socket once it's OPEN. Until then
  they're queued by the hook and flushed on the next open — including on
  a replacement socket. Queued requests were never transmitted, so
  flushing can't double-execute anything server-side.
- Pending calls are tagged with the socket they were transmitted on.
  When a socket closes or is replaced, only calls sent on *that* socket
  are rejected ("Connection closed"); queued calls survive, and calls in
  flight on a newer socket are no longer spuriously rejected by a stale
  close event from an old one.
- Destination guard: queued calls only follow the connection to the same
  agent instance. If the hook is re-pointed at a different address
  (agent, name, basePath, or path props change) before a queued call was
  transmitted, the call is rejected instead of executing against an
  instance it wasn't composed for.
- Non-streaming calls get a default 30s timeout as a backstop so lost
  responses reject instead of hanging. Configurable via the new
  `defaultCallTimeout` option (0 disables); an explicit per-call
  `timeout` always wins (`timeout: 0` opts out). Streaming calls are
  exempt.

AgentClient:

- Pending calls track whether their request was actually transmitted.
  On a transient disconnect, transmitted calls are rejected (their
  response can never arrive) while buffered calls stay pending —
  PartySocket re-sends them on reconnect. A permanent close rejects
  everything.
- Same `defaultCallTimeout` backstop as the hook.

Both now log a console.warn when an RPC response arrives with no
matching pending call (e.g. after a timeout) instead of silently
discarding it.

New rpc-robustness react test suite covers mount-time calls, stale
references after replacement, queued-call flush, in-flight rejection on
replacement, the destination guard, default-timeout behavior (including
timeout: 0 and streaming exemption), and the dropped-response warning;
client.test.ts gains the AgentClient equivalents.

Fixes #1738

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot

changeset-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 0238e3e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
agents Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jun 12, 2026

Copy link
Copy Markdown

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1746

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1746

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1746

create-think

npm i https://pkg.pr.new/create-think@1746

hono-agents

npm i https://pkg.pr.new/hono-agents@1746

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1746

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1746

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1746

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1746

commit: 0238e3e

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@threepointone threepointone merged commit e45b5ec into main Jun 12, 2026
5 checks passed
@threepointone threepointone deleted the fix/rpc-calls-hang-1738 branch June 12, 2026 11:23
@github-actions github-actions Bot mentioned this pull request Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

useAgent: RPC calls issued during initial connect can hang forever (response silently dropped)

1 participant