Is your feature request related to a problem? Please describe.
The SDK's transports (JsonRpcTransport, RestTransport, GrpcTransport) raise immediately on transient failures (network errors, timeouts, rate limits, 5xx responses, gRPC UNAVAILABLE/RESOURCE_EXHAUSTED) with no built-in retry mechanism. Every caller has to reimplement the same retry/backoff loop, and to do it correctly they have to inspect __cause__ chains because A2AClientError doesn't expose HTTP status codes or gRPC codes directly.
Describe the solution you'd like
A RetryTransport decorator wrapping any ClientTransport, mirroring the existing TenantTransportDecorator pattern:
inner = JsonRpcTransport(httpx_client=client, agent_card=card)
transport = RetryTransport(base=inner, max_retries=3)
The default predicate retries on:
A2AClientTimeoutError (always).
A2AClientError chained from httpx.RequestError or httpx.HTTPStatusError(408/429/502/503/504).
A2AClientError chained from grpc.aio.AioRpcError(UNAVAILABLE/RESOURCE_EXHAUSTED).
- Domain errors (
TaskNotFoundError, etc.) are never retried.
retry_predicate is configurable for custom logic, and on_retry is exposed as a hook for logging/metrics. Streaming methods (send_message_streaming, subscribe) only retry before the first event is yielded.
Describe alternatives you've considered
- Retry via
ClientCallInterceptor — not feasible; after() only fires on successful results, so interceptors never see exceptions.
- Transport-specific retry (httpx custom transport, gRPC
retryPolicy / UnaryUnaryClientInterceptor) — operates below the SDK's exception layer, requires two separate configurations, and doesn't cover all three transports uniformly.
- Retry inside each transport implementation — triples the logic across
JsonRpcTransport, RestTransport, GrpcTransport and forces every future transport to reimplement it.
Additional context
Purely additive: new RetryTransport class, tests, and one export change. No new dependencies — the grpc import is conditional, matching the SDK's existing pattern. ClientFactory integration is left as a follow-up.
Code of Conduct
Is your feature request related to a problem? Please describe.
The SDK's transports (
JsonRpcTransport,RestTransport,GrpcTransport) raise immediately on transient failures (network errors, timeouts, rate limits, 5xx responses, gRPCUNAVAILABLE/RESOURCE_EXHAUSTED) with no built-in retry mechanism. Every caller has to reimplement the same retry/backoff loop, and to do it correctly they have to inspect__cause__chains becauseA2AClientErrordoesn't expose HTTP status codes or gRPC codes directly.Describe the solution you'd like
A
RetryTransportdecorator wrapping anyClientTransport, mirroring the existingTenantTransportDecoratorpattern:The default predicate retries on:
A2AClientTimeoutError(always).A2AClientErrorchained fromhttpx.RequestErrororhttpx.HTTPStatusError(408/429/502/503/504).A2AClientErrorchained fromgrpc.aio.AioRpcError(UNAVAILABLE/RESOURCE_EXHAUSTED).TaskNotFoundError, etc.) are never retried.retry_predicateis configurable for custom logic, andon_retryis exposed as a hook for logging/metrics. Streaming methods (send_message_streaming,subscribe) only retry before the first event is yielded.Describe alternatives you've considered
ClientCallInterceptor— not feasible;after()only fires on successful results, so interceptors never see exceptions.retryPolicy/UnaryUnaryClientInterceptor) — operates below the SDK's exception layer, requires two separate configurations, and doesn't cover all three transports uniformly.JsonRpcTransport,RestTransport,GrpcTransportand forces every future transport to reimplement it.Additional context
Purely additive: new
RetryTransportclass, tests, and one export change. No new dependencies — thegrpcimport is conditional, matching the SDK's existing pattern.ClientFactoryintegration is left as a follow-up.Code of Conduct