Skip to content

[Feat]: Add RetryTransport for automatic retry with exponential backoff #871

@cchinchilla-dev

Description

@cchinchilla-dev

Is your feature request related to a problem? Please describe.

The SDK's transports (JsonRpcTransport, RestTransport, GrpcTransport) raise immediately on transient failures (network errors, timeouts, rate limits, 5xx responses, gRPC UNAVAILABLE/RESOURCE_EXHAUSTED) with no built-in retry mechanism. Every caller has to reimplement the same retry/backoff loop, and to do it correctly they have to inspect __cause__ chains because A2AClientError doesn't expose HTTP status codes or gRPC codes directly.

Describe the solution you'd like

A RetryTransport decorator wrapping any ClientTransport, mirroring the existing TenantTransportDecorator pattern:

inner = JsonRpcTransport(httpx_client=client, agent_card=card)
transport = RetryTransport(base=inner, max_retries=3)

The default predicate retries on:

  • A2AClientTimeoutError (always).
  • A2AClientError chained from httpx.RequestError or httpx.HTTPStatusError(408/429/502/503/504).
  • A2AClientError chained from grpc.aio.AioRpcError(UNAVAILABLE/RESOURCE_EXHAUSTED).
  • Domain errors (TaskNotFoundError, etc.) are never retried.

retry_predicate is configurable for custom logic, and on_retry is exposed as a hook for logging/metrics. Streaming methods (send_message_streaming, subscribe) only retry before the first event is yielded.

Describe alternatives you've considered

  • Retry via ClientCallInterceptor — not feasible; after() only fires on successful results, so interceptors never see exceptions.
  • Transport-specific retry (httpx custom transport, gRPC retryPolicy / UnaryUnaryClientInterceptor) — operates below the SDK's exception layer, requires two separate configurations, and doesn't cover all three transports uniformly.
  • Retry inside each transport implementation — triples the logic across JsonRpcTransport, RestTransport, GrpcTransport and forces every future transport to reimplement it.

Additional context

Purely additive: new RetryTransport class, tests, and one export change. No new dependencies — the grpc import is conditional, matching the SDK's existing pattern. ClientFactory integration is left as a follow-up.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Labels

component: clientIssues related to transport logic and configuration for external apps connecting to A2A agents.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions