Skip to content

Temporal SDK gRPC calls fail with DEADLINE_EXCEEDED after node restart (GraalVM / K8s) #2840

@olegdibrov

Description

@olegdibrov

When a Spring Boot application using the Temporal Java SDK is compiled as a GraalVM native image and deployed in Kubernetes, the application fails to communicate with Temporal after node restarts or pod rescheduling.

The same application works correctly:

  • ✅ On JVM (non-native)
  • ✅ In Docker outside Kubernetes
  • ❌ Fails in Kubernetes when running as GraalVM native image

This suggests a compatibility issue between Temporal Java SDK and GraalVM native runtime, potentially related to:

  • gRPC channel lifecycle
  • DNS resolution / service discovery
  • resource or reflection configuration
  • connection reuse after pod rescheduling

Steps to Reproduce

  1. Install Temporal using official Helm chart - https://github.com/temporalio/helm-charts
  2. Deploy demo application - https://github.com/olegdibrov/temporal-graalvm-k8s
    Install the app:
    helm install control {path/to/chart}

Application logic (executed on startup):

List<DescribeNamespaceResponse> namespaces = workflowClient
    .getWorkflowServiceStubs()
    .blockingStub()
    .listNamespaces(ListNamespacesRequest.newBuilder().build())
    .getNamespacesList();
log.info("Found {} namespaces", namespaces.size());
  1. Restart Kubernetes node OR drain node:
    kubectl drain <node> --ignore-daemonsets
    Observe application startup behavior

Actual Behavior

Application fails to start for 10–30 minutes

Repeated errors:
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline CallOptions was exceeded after 9.999s
Temporal cluster is healthy (all pods ready)
Eventually, the application may recover without restart

Expected Behavior

  • Application should reconnect to Temporal immediately after pod restart
  • listNamespaces should succeed consistently
  • No prolonged unavailability if Temporal cluster is healthy

Important Observations

  • Issue only occurs in GraalVM native image
  • Does NOT reproduce on JVM
  • Does NOT reproduce outside Kubernetes
  • Temporal services are reachable and healthy during failure window

Delay (~10–30 minutes) suggests:

  • stale DNS cache
  • broken gRPC channel reuse
  • or native-image-related networking issue

Environment

  • Temporal SDK: 1.33.0
  • GraalVM: 25
  • Java: 25
  • Spring Boot: 3.5.13
  • Kubernetes: v1.30.5

Logs

Example error:

io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline CallOptions was exceeded after 9.999786125s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions