feat: SDK-4417: emit ossdk.features_enabled in Otel per-event payload#2631
feat: SDK-4417: emit ossdk.features_enabled in Otel per-event payload#2631abdulraqeeb33 wants to merge 6 commits intomainfrom
Conversation
Adds a per-event Otel attribute carrying the canonical keys of feature flags currently enabled for this device (encoded as a sorted, comma- separated string; omitted when empty). Logs become easily queryable in Google Cloud Logs Explorer, e.g.: jsonPayload.attributes."ossdk.features_enabled":"sdk_background_threading" Read fresh on every emission so each record reflects FeatureManager's current featureStates snapshot — no SDK rebuild required for IMMEDIATE- mode flag changes, no late-bound state. - otel module: IOtelPlatformProvider.enabledFeatureFlags exposes the list as a platform-agnostic contract; OtelFieldsPerEvent encodes it. - core module (Android): IFeatureManager gains enabledFeatureKeys(); OtelPlatformProvider takes IFeatureManager via constructor injection (no @volatile var, no setter). OtelLifecycleManager and OneSignalCrashHandlerFactory thread it through. - OneSignalImp.initEssentials reorders so ensureApplicationServiceStarted runs before resolving FeatureManager from the IoC container. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Adds per-event OpenTelemetry log context for active feature flags so remote logs can be queried by flag state (SDK-4417).
Changes:
- Adds
IFeatureManager.enabledFeatureKeys()and threadsIFeatureManagerthrough Otel lifecycle/platform provider creation. - Extends
IOtelPlatformProviderwithenabledFeatureFlagsand emits enabled flags inOtelFieldsPerEventas a sorted, comma-separated attribute. - Updates/extends unit tests across Otel + crash plumbing to validate per-event semantics and injection.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| OneSignalSDK/onesignal/otel/src/test/java/com/onesignal/otel/attributes/OtelFieldsTopLevelTest.kt | Adds negative test ensuring the feature-flags attribute is not top-level. |
| OneSignalSDK/onesignal/otel/src/test/java/com/onesignal/otel/attributes/OtelFieldsPerEventTest.kt | Adds tests for omit/encode/sort and per-call re-read behavior. |
| OneSignalSDK/onesignal/otel/src/main/java/com/onesignal/otel/attributes/OtelFieldsPerEvent.kt | Emits enabled feature flags as a per-event attribute (key currently mismatched vs spec/tests). |
| OneSignalSDK/onesignal/otel/src/main/java/com/onesignal/otel/IOtelPlatformProvider.kt | Adds a new enabledFeatureFlags contract for platform providers. |
| OneSignalSDK/onesignal/core/src/test/java/com/onesignal/internal/OtelLifecycleManagerTest.kt | Updates lifecycle tests for new IFeatureManager injection and factory forwarding. |
| OneSignalSDK/onesignal/core/src/test/java/com/onesignal/internal/OtelLifecycleManagerFaultTest.kt | Threads IFeatureManager through factories and verifies fault tolerance still holds. |
| OneSignalSDK/onesignal/core/src/test/java/com/onesignal/debug/internal/logging/otel/android/OtelPlatformProviderTest.kt | Updates provider creation and adds tests for enabledFeatureFlags. |
| OneSignalSDK/onesignal/core/src/test/java/com/onesignal/debug/internal/crash/OtelIntegrationTest.kt | Updates integration tests to pass a feature manager into provider creation. |
| OneSignalSDK/onesignal/core/src/test/java/com/onesignal/debug/internal/crash/OneSignalCrashUploaderWrapperTest.kt | Updates feature-manager mocks to support enabledFeatureKeys(). |
| OneSignalSDK/onesignal/core/src/test/java/com/onesignal/debug/internal/crash/OneSignalCrashHandlerFactoryTest.kt | Updates factory tests for new featureManager parameter. |
| OneSignalSDK/onesignal/core/src/test/java/com/onesignal/core/internal/features/FeatureManagerTests.kt | Adds unit coverage for enabledFeatureKeys() and activation-mode behavior. |
| OneSignalSDK/onesignal/core/src/main/java/com/onesignal/internal/OtelLifecycleManager.kt | Requires IFeatureManager and passes it into crash handler + platform provider factories. |
| OneSignalSDK/onesignal/core/src/main/java/com/onesignal/internal/OneSignalImp.kt | Ensures app service starts before resolving IFeatureManager, injects it into OtelLifecycleManager. |
| OneSignalSDK/onesignal/core/src/main/java/com/onesignal/debug/internal/logging/otel/android/OtelPlatformProvider.kt | Adds per-access enabledFeatureFlags backed by IFeatureManager.enabledFeatureKeys(). |
| OneSignalSDK/onesignal/core/src/main/java/com/onesignal/debug/internal/crash/OneSignalCrashUploaderWrapper.kt | Passes featureManager into platform provider creation. |
| OneSignalSDK/onesignal/core/src/main/java/com/onesignal/debug/internal/crash/OneSignalCrashHandlerFactory.kt | Adds featureManager param and injects it into platform provider creation. |
| OneSignalSDK/onesignal/core/src/main/java/com/onesignal/core/internal/features/FeatureManager.kt | Implements enabledFeatureKeys() (but currently enables a local override by default). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // operator; omitted entirely when no flags are enabled to keep payloads compact. | ||
| val enabledFlags = platformProvider.enabledFeatureFlags | ||
| if (enabledFlags.isNotEmpty()) { | ||
| attributes["ossdk.feature_flags"] = enabledFlags.sorted().joinToString(",") |
There was a problem hiding this comment.
OtelFieldsPerEvent is writing the enabled flags under the attribute key ossdk.feature_flags, but the PR (and the accompanying tests) specify the canonical per-event key as ossdk.features_enabled. With the current key, the new tests will fail and log queries/documentation will not match. Please rename the attribute key to ossdk.features_enabled (and keep the omit-when-empty behavior).
| attributes["ossdk.feature_flags"] = enabledFlags.sorted().joinToString(",") | |
| attributes["ossdk.features_enabled"] = enabledFlags.sorted().joinToString(",") |
| // private val localFeatureOverrides: Set<String> = emptySet() | ||
| private val localFeatureOverrides: Set<String> = | ||
| setOf(FeatureFlag.SDK_BACKGROUND_THREADING.key) |
There was a problem hiding this comment.
localFeatureOverrides is currently hard-coded to setOf(FeatureFlag.SDK_BACKGROUND_THREADING.key), which will force-enable background threading for all builds and environments. This looks like the local testing hook described in the comment above and should not be enabled by default. Please revert this to emptySet() (or gate it behind a debug-only build flag) so production behavior is driven solely by remote config / activation mode rules.
| // private val localFeatureOverrides: Set<String> = emptySet() | |
| private val localFeatureOverrides: Set<String> = | |
| setOf(FeatureFlag.SDK_BACKGROUND_THREADING.key) | |
| private val localFeatureOverrides: Set<String> = emptySet() |
| @@ -41,20 +42,21 @@ import com.onesignal.otel.crash.IOtelAnrDetector | |||
| @Suppress("TooManyFunctions") | |||
| internal class OtelLifecycleManager( | |||
| private val context: Context, | |||
| private val crashHandlerFactory: (Context, IOtelLogger) -> IOtelCrashHandler = | |||
| { ctx, log -> OneSignalCrashHandlerFactory.createCrashHandler(ctx, log) }, | |||
| private val featureManager: IFeatureManager, | |||
There was a problem hiding this comment.
The class KDoc still says production callers can use OtelLifecycleManager(context), but the constructor now requires an IFeatureManager. Please update this documentation to reflect the new required parameter (or adjust the example to show how it’s obtained/injected).
| // FeatureManager is constructor-injected into OtelLifecycleManager so OTel resource | ||
| // attributes (ossdk.features_enabled) reflect applied flag state from the very first |
There was a problem hiding this comment.
This comment refers to ossdk.features_enabled as an "OTel resource" attribute, but the implementation and PR description treat it as a per-event/log-record attribute. Please adjust the wording here to avoid confusion about where the attribute is emitted (resource vs per-event payload).
| // FeatureManager is constructor-injected into OtelLifecycleManager so OTel resource | |
| // attributes (ossdk.features_enabled) reflect applied flag state from the very first | |
| // FeatureManager is constructor-injected into OtelLifecycleManager so per-log/event | |
| // attributes like ossdk.features_enabled reflect applied flag state from the very first |
| * The order is not guaranteed; consumers that need a deterministic encoding (e.g. for | ||
| * stable log payloads) should sort before serializing. | ||
| */ | ||
| val enabledFeatureFlags: List<String> |
There was a problem hiding this comment.
IOtelPlatformProvider is a public interface in the published onesignal-otel module, and this PR adds a new abstract member (val enabledFeatureFlags). That is a breaking API/binary change for any downstream implementers of IOtelPlatformProvider (e.g., other platform/KMP implementations). If this interface is intended to be internal-only, consider marking it internal; otherwise, consider providing a default implementation (or updating the PR description to acknowledge the public contract change).
| val enabledFeatureFlags: List<String> | |
| val enabledFeatureFlags: List<String> | |
| get() = emptyList() |
| // private val localFeatureOverrides: Set<String> = emptySet() | ||
| private val localFeatureOverrides: Set<String> = | ||
| setOf(FeatureFlag.SDK_BACKGROUND_THREADING.key) |
There was a problem hiding this comment.
🔴 Local feature override accidentally committed: the production-safe emptySet() is commented out in favor of setOf(FeatureFlag.SDK_BACKGROUND_THREADING.key) (FeatureManager.kt:175-177), force-enabling SDK_BACKGROUND_THREADING for 100% of users on the released SDK. This bypasses the remote-config canary/percentage rollout for a high-risk threading-mode flag and is unrelated to this PR's OTel goal — please revert the override (re-comment the setOf line, restore emptySet()) before merge.
Extended reasoning...
What is wrong
In FeatureManager.kt:175-177, the localFeatureOverrides companion-object constant has been flipped from the production-safe value to the test-only override. The diff in this PR is:
- private val localFeatureOverrides: Set<String> = emptySet()
-// private val localFeatureOverrides: Set<String> =
-// setOf(FeatureFlag.SDK_BACKGROUND_THREADING.key)
+// private val localFeatureOverrides: Set<String> = emptySet()
+ private val localFeatureOverrides: Set<String> =
+ setOf(FeatureFlag.SDK_BACKGROUND_THREADING.key)The KDoc immediately above the constant (lines 170–174) explicitly labels it as a "Local-only test hook for forcing features ON without backend config. Add feature keys here while testing locally". The previous state had emptySet() active and the override commented out — the PR inverted them, so the test override is now the live binding shipped to users.
Why the existing code does not prevent it
refreshEnabledFeatures (same file) unions localFeatureOverrides into enabledFeatureKeys unconditionally on every refresh — there is no BuildConfig.DEBUG guard, no debuggable-package check, and no environment gate. The override applies in any build, including released production. Because SDK_BACKGROUND_THREADING is APP_STARTUP-latched, the flip is captured at the first refresh and then persisted for the entire session; remote config cannot turn it OFF mid-session.
The Logging.warn("OneSignal: Local feature override enabled for testing only: …") line in refreshEnabledFeatures will also fire on every device on every refresh, polluting production logs.
Step-by-step proof
- App with this SDK build starts.
FeatureManager.<init>callsrefreshEnabledFeatures(configModelStore.model, applyNextRunOnlyFeatures = true). enabledFeatureKeysis computed asmodel.sdkRemoteFeatureFlags.map(canonicalize) + localFeatureOverrides.map(canonicalize). With this diff,localFeatureOverrides = setOf("sdk_background_threading"), so"sdk_background_threading"is added regardless of what the backend returned.localFeatureOverrides.isNotEmpty()is true →Logging.warn("…enabled for testing only: [sdk_background_threading]")fires on every device.- Loop over
FeatureFlag.entries. ForSDK_BACKGROUND_THREADING:desiredState = isEnabledIn(enabledFeatureKeys) = true.activationMode = APP_STARTUP,applyNextRunOnlyFeatures = true→nextStates[SDK_BACKGROUND_THREADING] = trueandapplySideEffects(...)callsThreadingMode.updateUseBackgroundThreading(enabled = true, …). - The flag stays latched true for the lifetime of the process. Subsequent remote-config refreshes call
refreshEnabledFeatures(..., applyNextRunOnlyFeatures = false), which for APP_STARTUP-mode flags only logs but does not change the runtime state — so even a backend rollback cannot disable it for an already-running device.
Impact
- Every device running this released SDK has
SDK_BACKGROUND_THREADINGforce-enabled at startup, regardless of the backend rollout. - The canary / percentage rollout system that exists specifically for high-risk flags like this one (a threading-mode change) is fully bypassed.
- The "testing only" warning log is emitted on every device, every refresh.
- This is unrelated to the PRs stated goal of emitting
ossdk.features_enabledin the per-event OTel payload — the new OTel attribute reads fromenabledFeatureKeys(), which would correctly reflect remote config alone iflocalFeatureOverrides = emptySet().
Fix
Revert the override before merge: re-comment the setOf(...) line and uncomment the emptySet() line, so lines 175-177 read:
private val localFeatureOverrides: Set<String> = emptySet()
// private val localFeatureOverrides: Set<String> =
// setOf(FeatureFlag.SDK_BACKGROUND_THREADING.key)This is almost certainly a developers local testing setup that was committed by accident.
| val enabledFlags = platformProvider.enabledFeatureFlags | ||
| if (enabledFlags.isNotEmpty()) { | ||
| attributes["ossdk.feature_flags"] = enabledFlags.sorted().joinToString(",") |
There was a problem hiding this comment.
🔴 The new per-event attribute is emitted under the key ossdk.feature_flags (OtelFieldsPerEvent.kt:36) but the PR description, doc comments, and every new test assertion use ossdk.features_enabled. This breaks the three new positive tests in OtelFieldsPerEventTest.kt (they look up the missing key and get null) and the documented Logs Explorer query jsonPayload.attributes."ossdk.features_enabled":"..." will never match anything in production. Rename the literal on line 36 to ossdk.features_enabled to match the spec/tests.
Extended reasoning...
Attribute key mismatch
The production code emits the new per-event attribute under one name while the entire rest of the PR — tests, doc comments, PR description, and the example Logs Explorer query — uses a different name.
Where it diverges:
OtelFieldsPerEvent.kt:36writesattributes["ossdk.feature_flags"] = enabledFlags.sorted().joinToString(",")- All other references — including the doc comments at
OneSignalImp.kt:244,OtelPlatformProvider.kt:68, andOneSignalCrashHandlerFactory.kt:16,26— and the new test assertions atOtelFieldsPerEventTest.kt:85,99,113,116andOtelFieldsTopLevelTest.kt:85— use"ossdk.features_enabled". - The PR title and description (including the GCP Logs Explorer example query
jsonPayload.attributes."ossdk.features_enabled":"sdk_background_threading") also usefeatures_enabled.
A grep across the repo shows "ossdk.feature_flags" appears only on the one production line introduced by this PR; "ossdk.features_enabled" appears everywhere else.
Why existing code does not prevent it: the value is simply written to a MutableMap<String, String>. There is no compile-time check that the key matches the test/spec name; the typo passes the type checker.
Impact:
- The three new positive tests in
OtelFieldsPerEventTest.ktwill fail at runtime. For example, on line 85 the test assertsattributes["ossdk.features_enabled"] shouldBe "sdk_background_threading", but production only ever writesattributes["ossdk.feature_flags"]. The map lookup of the wrong key returnsnull, so the kotest assertionnull shouldBe "sdk_background_threading"fails. The same failure applies to the multi-flag-sorted test and the per-event re-read test. (The empty-flags test happens to pass only because neither key is set.) - The negative test in
OtelFieldsTopLevelTest.kt:85still passes (it asserts the per-event-only key is not at the top level), but it asserts the wrong key for what production actually emits. - The PRs stated queryability goal is broken. The documented query
jsonPayload.attributes."ossdk.features_enabled":"sdk_background_threading"would never match an emitted record because the actual attribute name in production logs isossdk.feature_flags. This silently defeats the entire SDK-4417 motivation.
Step-by-step proof:
- App enables
SDK_BACKGROUND_THREADING(the default in this PR — see the uncommentedlocalFeatureOverridesinFeatureManager.kt). OtelFieldsPerEvent.getAttributes()runs on a log emission.enabledFlags = platformProvider.enabledFeatureFlagsreturns["sdk_background_threading"].- Code path enters the
if (enabledFlags.isNotEmpty())branch. - The map gets
attributes["ossdk.feature_flags"] = "sdk_background_threading". - The test in
OtelFieldsPerEventTest.kt:85runsattributes["ossdk.features_enabled"] shouldBe "sdk_background_threading". Map lookup of"ossdk.features_enabled"returnsnull(the only key set is"ossdk.feature_flags"). The assertionnull shouldBe "sdk_background_threading"fails the test. - In production, the emitted log record contains
jsonPayload.attributes."ossdk.feature_flags": "sdk_background_threading". A user running the documented Logs Explorer queryjsonPayload.attributes."ossdk.features_enabled":"sdk_background_threading"matches zero records.
Fix: change the literal on OtelFieldsPerEvent.kt:36 from "ossdk.feature_flags" to "ossdk.features_enabled". Tests, doc comments, and the spec are already internally consistent on features_enabled, so a single one-character edit aligns the production code with the rest of the PR.
Replaces the constructor-injected IFeatureManager with a () -> IFeatureManager supplier so the OTel pipeline can come up before service bootstrap completes. The supplier is an immutable constructor val (no @volatile, no setter); it's invoked per-event when reading enabledFeatureFlags, which is when services are guaranteed to be ready. - OtelPlatformProvider: featureManager -> featureManagerProvider lambda; enabledFeatureFlags getter calls it on every access. - OtelLifecycleManager + OneSignalCrashHandlerFactory: signatures updated to thread the supplier through. - OneSignalImp.initEssentials: restored to its original order (OtelLifecycleManager constructed first so crash/log capture starts early, ensureApplicationServiceStarted at the end). Wires the supplier as { services.getService<IFeatureManager>() }. - OneSignalCrashUploaderWrapper: forwards its existing IFeatureManager via a closure { featureManager } supplier. - Reverts a stray local-test toggle of FeatureManager.localFeatureOverrides that snuck into the previous commit. - Fixes a doc typo: ossdk.features_flags -> ossdk.features_enabled. Tests updated to lambda form (createAndroidOtelPlatformProvider(ctx) { fm }). Made-with: Cursor
Aligns the per-event attribute name with the more standard "feature_flags" naming (was inconsistently called features_enabled in tests and docs while the impl already used feature_flags). Query example: jsonPayload.attributes."ossdk.feature_flags":"sdk_background_threading" Made-with: Cursor
Two doc/contract fixes from Copilot's review on PR #2631: - OtelLifecycleManager class KDoc: previously described the production constructor as `OtelLifecycleManager(context)`, but it now requires a featureManagerProvider supplier. Updated example to reflect the new signature. - IOtelPlatformProvider.enabledFeatureFlags: provided a default getter returning emptyList() so adding this property is no longer a binary- breaking change for downstream KMP / platform implementers. Platforms that want to populate `ossdk.feature_flags` opt in by overriding. The two larger review findings (accidental localFeatureOverrides flip, attribute-key mismatch) were already addressed in earlier commits on this branch. Made-with: Cursor
This reverts commit a5bbdb2.
Description
One Line Summary
SDK emits the canonical keys of currently-enabled feature flags as a per-event Otel attribute (
ossdk.features_enabled) so logs are queryable by feature flag in Google Cloud Logs Explorer.Details
Motivation
SDK-4417. When debugging SDK behavior from Otel logs, we need to know which feature flags were active at the moment a record was written. Without this, a regression that only manifests when a flag is on (e.g.
sdk_background_threading) is hard to pinpoint from logs alone.Scope
ossdk.features_enabled— a sorted, comma-separated list of canonical flag keys; omitted entirely when no flags are enabled. Example value:"sdk_background_threading".FeatureManager.featureStatessnapshot. Empty string is never written; absent attribute = no flags enabled.IOtelPlatformProvider.enabledFeatureFlagsis the platform-agnostic contract; the Android impl delegates toIFeatureManager. iOS or other platforms can implement their own source.Architecture
Strict constructor injection — no late-bound state, no
@Volatile var, no setters:OtelFieldsPerEvent.getAttributes()readsplatformProvider.enabledFeatureFlagson every log emission, soIMMEDIATE-mode flag changes show up on the very next record without rebuilding the Otel SDK.Querying example
In Google Cloud Logs Explorer:
The
:operator does substring contains; flag keys are unique full identifiers, so substring matching has no false positives.Testing
Unit testing
FeatureManagerTests— added coverage for newenabledFeatureKeys(): empty by default, returns canonical key when enabled, respectsAPP_STARTUPlatching afteronModelReplaced.OtelFieldsPerEventTest— added cases for empty, single flag, multiple sorted, and "re-read on every call" (per-event semantics).OtelFieldsTopLevelTest— added negative case confirmingossdk.features_enabledis not at top level.OtelPlatformProviderTest— three cases: empty when manager has no flags, fresh per-access from injected manager, empty when manager throws.OtelLifecycleManagerTest— verifiesIFeatureManageris forwarded to the platform provider via the factory.OtelLifecycleManagerFaultTest,OneSignalCrashHandlerFactoryTest,OneSignalCrashUploaderWrapperTest,OtelIntegrationTest— updated to thread the new constructor parameter through.All tests in the affected suites pass; pre-existing
SDKInitTestsfailures onmainare unrelated.Manual testing
Not yet exercised end-to-end against a staging app. The wire format is straightforward (one extra string attribute on each
LogRecord), and unit tests cover the resolution path. Worth a smoke test once merged.Affected code checklist
(Otel/observability is not in the affected-code list — this PR only touches the Otel pipeline and feature flag plumbing.)
Checklist
Overview
Testing
SDKInitTestsfailures onmain)Final pass