Skip to content

fix(grpc): resilient SwooleGrpcTransport — mutex, retry, reconnect and PHP 8.2 compat#25

Open
lucas-angeli-gimenes wants to merge 3 commits into
mainfrom
improve_grpc
Open

fix(grpc): resilient SwooleGrpcTransport — mutex, retry, reconnect and PHP 8.2 compat#25
lucas-angeli-gimenes wants to merge 3 commits into
mainfrom
improve_grpc

Conversation

@lucas-angeli-gimenes
Copy link
Copy Markdown
Collaborator

@lucas-angeli-gimenes lucas-angeli-gimenes commented Apr 23, 2026

Description

Fixes three production bugs in SwooleGrpcTransport reported via error logs, plus a suite of structural improvements identified during code review.

Bug fixes

Error Root cause Fix
Broken pipe — batch lost Stale HTTP/2 connection not detected; no retry Retry loop on send failure (attemptSend): reset client + reconnect up to maxRetries
Creation of dynamic property ... deprecated — recv aborted Swoole sets $serverLastStreamId dynamically; Hyperf's ErrorExceptionHandler converts E_DEPRECATEDErrorException @$client->recv() suppresses the notice before Hyperf's handler sees it
Race condition — concurrent coroutines corrupt shared $client No synchronisation on send() Channel(1) as a coroutine-safe mutex; shutdown() closes the channel to unblock waiting coroutines without deadlock

Recv failures are not retried intentionally: once send() succeeds, delivery is uncertain and retrying DELTA metric exports risks duplication.

Structural improvements

  • retryDelay / maxRetries now wired from factory → transport constructor (previously silently discarded)
  • Endpoint fallback (http://localhost:4317) added to Trace and Metric gRPC factories
  • 'application/x-protobuf' string literals replaced with ContentTypes::PROTOBUF constant
  • getClient() now closes stale (disconnected) clients before reconnecting
  • resetClient() closes the socket before nulling the reference (avoids leaking dead sockets)
  • Compression type validated in factory — prevents compressed_flag=1 with uncompressed payload
  • grpc-status cast to int before comparison (more robust against whitespace / non-conformant servers)
  • attemptSend extracted into doRequest to reduce cyclomatic complexity below PHPMD threshold
  • gRPC framing comment added to packMessage

Also in this PR

  • feat(shutdown): flush OTel on AfterHandle / AfterExecute events so CLI commands flush spans on exit
  • style: fix method ordering in ChannelBatchSpanProcessor per PHP-CS-Fixer rule

Type of change

  • Bug fix
  • Performance/Refactor

Checklist

  • Title follows Conventional Commits
  • Tests added/updated
  • composer test passes locally (184 tests, 461 assertions)
  • Documentation updated (README/Docs)
  • No breaking changes

Related issues

#23

lucas-angeli-gimenes and others added 3 commits April 23, 2026 12:09
- Add Channel(1) mutex to serialize concurrent coroutine sends; shutdown
  closes the channel to unblock waiting coroutines without deadlock
- Extract attemptSend() retry loop (send failure → reset + retry up to
  maxRetries; recv failure → ErrorFuture immediately to avoid duplicate
  DELTA metric exports)
- Extract doRequest() to reduce cyclomatic complexity below threshold
- Reset HTTP/2 client on send/recv failure so next call reconnects
- Fix getClient(): close stale (disconnected) client before reconnecting
- Suppress PHP 8.2 E_DEPRECATED from Swoole's dynamic $serverLastStreamId
  property in recv() using @ operator (Hyperf ErrorExceptionHandler converts
  deprecation notices to ErrorException, aborting the recv)
- Validate unsupported compression types in factory (prevents compressed
  flag=1 with uncompressed payload)
- Wire retryDelay and maxRetries from factory to transport constructor
- Add endpoint fallback in Trace and Metric gRPC exporter factories
- Replace 'application/x-protobuf' literal with ContentTypes::PROTOBUF
- Cast grpc-status to int before comparison (more robust against whitespace)
- Add tests: default endpoint fallback, retry parameters, invalid compression

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ensures traces and metrics are flushed when a Hyperf CLI command exits,
in addition to the existing OnWorkerExit handler.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move ensureInitialized() after public methods per PHP-CS-Fixer
ordered_class_elements rule.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 23, 2026 15:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves reliability and PHP 8.2 compatibility of the Swoole-based OTLP gRPC transport, and ensures telemetry is flushed on CLI shutdown events.

Changes:

  • Add coroutine-safe send serialization plus retry/reconnect logic to SwooleGrpcTransport, and wire retryDelay/maxRetries through the factory.
  • Add default OTLP gRPC endpoints in trace/metric factories and replace protobuf content-type string literals with ContentTypes::PROTOBUF.
  • Flush/shutdown OTel providers on CLI command lifecycle events, plus associated unit test updates.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/Transport/SwooleGrpcTransport.php Adds mutex-based serialization and retry/reconnect logic; refactors request execution.
src/Transport/SwooleGrpcTransportFactory.php Validates compression type and forwards retry parameters to the transport.
src/Listener/OtelShutdownListener.php Extends listener to CLI command events to trigger provider shutdown/flush.
src/SpanProcessor/ChannelBatchSpanProcessor.php Reorders methods to satisfy style/tooling rules.
src/Factory/Trace/Exporter/OtlpGrpcTraceExporterFactory.php Adds default endpoint fallback and uses ContentTypes::PROTOBUF.
src/Factory/Metric/Exporter/OtlpGrpcMetricExporterFactory.php Adds default endpoint fallback and uses ContentTypes::PROTOBUF.
src/Factory/Log/Exporter/OtlpGrpcLogExporterFactory.php Uses ContentTypes::PROTOBUF (and already applies default endpoint fallback).
tests/Unit/Transport/SwooleGrpcTransportFactoryTest.php Adds coverage for retry parameter wiring and unsupported compression validation.
tests/Unit/Listener/OtelShutdownListenerTest.php Updates expected listened events to include CLI command lifecycle events.
tests/Unit/Factory/Trace/Exporter/OtlpGrpcTraceExporterFactoryTest.php Adds test ensuring factory works with default endpoint when options are empty.
tests/Unit/Factory/Metric/Exporter/OtlpGrpcMetricExporterFactoryTest.php Adds test ensuring factory works with default endpoint when options are empty.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 67 to +71
timeout: $timeout,
ssl: $parsed['ssl'],
compression: $compressionType,
retryDelay: $retryDelay,
maxRetries: $maxRetries,
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retryDelay/maxRetries are now forwarded into the transport, but there’s no validation that they’re non-negative. A negative maxRetries will cause the retry loop to never run (no send attempt), and a negative retryDelay could lead to confusing behavior. Consider validating (and rejecting) negative values here before constructing the transport.

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +108
for ($attempt = 0; $attempt <= $this->maxRetries; ++$attempt) {
if ($this->closed) {
return new ErrorFuture(new RuntimeException('Transport is closed'));
}

Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry loop condition for ($attempt = 0; $attempt <= $this->maxRetries; ++$attempt) will execute zero times if maxRetries is negative, resulting in an immediate Unknown transport error after retries without any send attempt. Consider normalizing maxRetries to >= 0 (e.g., in the constructor) or throwing on invalid values to avoid silent misconfiguration.

Copilot uses AI. Check for mistakes.
Comment on lines 28 to 32
return [
OnWorkerExit::class,
AfterHandle::class,
AfterExecute::class,
];
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subscribing to both AfterHandle and AfterExecute means process() will run twice for a single CLI command in environments where both events are dispatched, causing duplicate shutdown attempts. Consider guarding so shutdown runs only once (e.g., track a local shutdownCalled flag) or subscribe to only one of these events if possible.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants