Skip to content

feat(otel): Add OpenTelemetry plugin module with deterministic tracing#426

Open
ayushiahjolia wants to merge 1 commit into
mainfrom
otel-plugin
Open

feat(otel): Add OpenTelemetry plugin module with deterministic tracing#426
ayushiahjolia wants to merge 1 commit into
mainfrom
otel-plugin

Conversation

@ayushiahjolia
Copy link
Copy Markdown
Contributor

@ayushiahjolia ayushiahjolia commented Jun 4, 2026

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Issue Link, if available

#333

Description

Add OpenTelemetry plugin module for durable execution tracing.

Changes

New otel-plugin Maven module providing a DurableExecutionPlugin implementation that emits OTel spans for durable execution lifecycle events.

What's covered in this PR -

  • All invocations of the same execution share a single deterministic trace ID
  • Operation spans are opened when operations start and closed when they complete or the invocation ends
  • Attempt spans wrap user code execution with Context.makeCurrent() so auto-instrumented AWS SDK calls nest correctly
  • X-Ray trace context extracted from Lambda environment and used as parent for invocation spans
  • Deterministic sampling ensures consistent trace/no-trace decision across all invocations of the same execution
  • SLF4J MDC enriched with trace_id, span_id, and execution ARN during user code execution for log-trace correlation
  • Spans include attributes: execution ARN, operation ID, type, subtype, name, parent ID, attempt number, outcome
  • forceFlush called before Lambda freezes to ensure spans are exported
  • Plugin errors isolated, never disrupt SDK execution

Not covered in this PR -

  • Spans for virtual contexts
  • Duplicate onOperationStart during replay

Usage

var idGenerator = new DeterministicIdGenerator();
var tracerProvider = SdkTracerProvider.builder()
    .setIdGenerator(idGenerator)
    .addSpanProcessor(SimpleSpanProcessor.create(exporter))
    .build();

@Override
protected DurableConfig createConfiguration() {
    return DurableConfig.builder()
        .withPlugins(new OpenTelemetryDurablePlugin(tracerProvider, idGenerator))
        .build();
}

Demo/Screenshots

Screenshot 2026-06-04 at 4 44 48 PM

Checklist

  • I have filled out every section of the PR template
  • I have thoroughly tested this change

Testing

Unit Tests

Have unit tests been written for these changes? Yes

Integration Tests

Have integration tests been written for these changes? Yes

Examples

Has a new example been added for the change? (if applicable) Yes

@ayushiahjolia ayushiahjolia force-pushed the otel-plugin branch 6 times, most recently from aa8f1b6 to 96492d2 Compare June 4, 2026 22:40
@ayushiahjolia ayushiahjolia marked this pull request as ready for review June 4, 2026 23:46
@ayushiahjolia ayushiahjolia requested a review from a team June 4, 2026 23:46
Comment thread examples/template.yaml
- !Ref JavaVersion
- runtime
Handler: "software.amazon.lambda.durable.examples.general.OtelExample"
Role: !Ref RoleArn
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function requires additional OpenTelemetry layers to work

.setIdGenerator(idGenerator)
.addSpanProcessor(SimpleSpanProcessor.create(LoggingSpanExporter.create()))
.build();
var otelPlugin = new OpenTelemetryDurablePlugin(tracerProvider, idGenerator);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow users to choose their own id generator for our plugin?

public void onOperationStart(OperationInfo info) {
if (!sampled || info.id() == null) return;

idGenerator.setNextSpanOperationId(info.id());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't work in a multithreading env. See the similar issue in Python: aws/aws-durable-execution-sdk-python#422

* @deprecated This is a preview API that is experimental and may be changed or removed in future releases.
*/
@Deprecated
public final class SamplingUtil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OpenTelemetry already provided a similar sampler TraceIdRatioBased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants