Summary
Webhook processing is currently fully synchronous and updates counters/gauges directly from each delivery. That makes correctness and latency sensitive to duplicates, out-of-order events, bursty traffic, and any future increase in processing cost.
This issue tracks hardening the ingestion pipeline so promgithub can safely handle higher event volume and retries without corrupting metrics or slowing webhook acknowledgements.
Why this matters
- GitHub can retry webhook deliveries.
- Events can arrive out of order.
- Synchronous processing increases end-to-end webhook latency.
- Direct gauge mutation without tracked state can drift badly over time.
Goals
- Decouple webhook acknowledgement from downstream metric processing.
- Add deduplication and better event-state handling.
- Improve correctness for in-progress and queued gauges.
- Introduce backpressure visibility and overload behavior.
Suggested scope
- Add delivery deduplication using GitHub delivery IDs.
- Introduce a bounded queue and worker pool for event processing.
- Track workflow/job state transitions using stable identifiers.
- Expose internal processing metrics such as queue depth, dropped events, and processing errors.
Child issues
- Add delivery deduplication for retried GitHub events.
- Implement asynchronous ingest with bounded worker pool and backpressure metrics.
- Redesign workflow/job gauge handling around tracked state transitions.
- Document scaling behavior for single-instance vs multi-instance deployments.
Acceptance criteria
- Webhook acknowledgement path remains fast under burst load.
- Duplicate deliveries do not inflate counters or corrupt gauges.
- Operators can observe overload and processing health through metrics.
Summary
Webhook processing is currently fully synchronous and updates counters/gauges directly from each delivery. That makes correctness and latency sensitive to duplicates, out-of-order events, bursty traffic, and any future increase in processing cost.
This issue tracks hardening the ingestion pipeline so promgithub can safely handle higher event volume and retries without corrupting metrics or slowing webhook acknowledgements.
Why this matters
Goals
Suggested scope
Child issues
Acceptance criteria