Add Bugsnag error grouping with stable normalized keys#234
Open
morgan-wowk wants to merge 1 commit intobugsnag/orchestrator-integrationfrom
Open
Add Bugsnag error grouping with stable normalized keys#234morgan-wowk wants to merge 1 commit intobugsnag/orchestrator-integrationfrom
morgan-wowk wants to merge 1 commit intobugsnag/orchestrator-integrationfrom
Conversation
This was referenced May 9, 2026
Collaborator
Author
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This was referenced May 9, 2026
0405d8b to
754d49c
Compare
Introduces error_normalization.py which strips instance-specific values (pod names, IDs, memory addresses, byte offsets) from exceptions so structurally identical errors collapse to one group in Bugsnag. TANGLE_BUGSNAG_CUSTOM_GROUPING_KEY controls the metadata key name — no-op when unset, allowing Shopify deployments to set it without touching OSS code. System errors reported via record_system_error_exception are prefixed with "SYSTEM_ERROR: " for easy filtering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
754d49c to
148b0ab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Add Bugsnag error grouping with stable normalized keys
Introduces configurable error grouping so structurally identical exceptions collapse into a single group rather than creating a new entry per unique pod name, UUID, or memory address.
How it works
A new
TANGLE_BUGSNAG_CUSTOM_GROUPING_KEYenv var controls the metadata key name written on each Bugsnag event. When unset the feature is a complete no-op. When set (by a deployment), every notified exception gets acustom[<key>]tab containing a normalized string derived from the exception type and message.System errors reported through
record_system_error_exceptionare additionally prefixed withSYSTEM_ERROR:so they can be filtered or grouped separately from non-system errors.Error taxonomy
The following exception types, from one consumer use case, are normalized to stable grouping keys:
kubernetes ApiException (404): NotFound: pods "{pod}" not foundkubernetes ApiException (400): BadRequest: container "main" in pod {pod} is terminatedkubernetes ApiException (400): BadRequest: container "main" in pod {pod} is waiting to start: PodInitializingkubernetes ApiException (400): BadRequest: container "main" in pod {pod} is not availablekubernetes ApiException (500): InternalError: failed calling webhook "<>": context deadline exceededUnicodeDecodeError: 'utf-8' codec can't decode byte at position {n}MaxRetryError: k8s connection pool max retries exceeded (ReadTimeoutError)OrchestratorError: Unexpected running container status: {object}ExceptionType: {message with addresses/UUIDs/IDs stripped}Many exception types (e.g.
AttributeError,sqlalchemy.exc.OperationalError) already produce stable messages and pass through the fallback unchanged.Changes
error_normalization.py(new) — one public functionnormalize_error_message(*, exception)dispatching to type-specific handlers before falling back to a generic stripper that removes hex addresses, UUIDs, and long alphanumeric IDsbugsnag_instrumentation.py— readsTANGLE_BUGSNAG_CUSTOM_GROUPING_KEY;_before_notifyattaches the normalized key when configured; supports an optionalgrouping_prefixpassed throughnotify(**metadata)orchestrator_sql.py—record_system_error_exceptionpassesgrouping_prefix="SYSTEM_ERROR"so system errors are visually distincttest_error_normalization.py(new) — 15 unit tests covering all error groups and the fallback pathOSS note
The grouping key name is not hardcoded — it is supplied entirely via
TANGLE_BUGSNAG_CUSTOM_GROUPING_KEYat deploy time, so no internal platform names appear in OSS code.