Asynchronous Token Bucket Rate Limiter by essiebx · Pull Request #35 · microsoft/Webwright

essiebx · 2026-05-29T09:19:19Z

Summary

This PR introduces an optional asynchronous token bucket rate limiter to smooth bursty LLM traffic and reduce request-rate-based 429 Too Many Requests errors.

The limiter is applied at a single choke point (BaseModel._post_with_retries) and is provider-agnostic across OpenAI, Anthropic, and OpenRouter.

When disabled (throttle_rate: 0.0, default), the implementation introduces no behavioral changes and zero runtime overhead.

Changes

1. AsyncTokenBucket Implementation

Location: src/webwright/utils/throttle.py

Provides an async-safe token bucket rate limiter that:

Enforces request rate limits before API calls
Uses time-based token refill logic
Avoids busy-waiting via asyncio.sleep

Purpose: Prevent burst traffic from reaching external APIs before rate limiting is applied.

2. Process-Global Keyed Registry

Introduced a registry mapping:

(rate, capacity) → AsyncTokenBucket

Purpose:

Ensures independent throttles for different model configurations
Shares a single bucket for identical configurations
Prevents cross-configuration interference within the same process

3. Configuration Updates

Added the following fields to BaseModelConfig:

throttle_rate (requests per second)
throttle_capacity (burst size)

Default:

throttle_rate: 0.0

Purpose: Fully opt-in behavior with no overhead when disabled.

4. Integration into Request Pipeline

Throttling is enforced inside:

BaseModel._post_with_retries()

Before each external API request:

await bucket.acquire()

Purpose: Ensures all provider requests pass through a single unified rate-limiting choke point.

5. Test Coverage

Location: tests/unit/test_throttle.py

Added 15 unit tests covering:

Rate and capacity validation
Burst behavior
Token refill correctness
Blocking behavior under exhaustion
Registry reuse per configuration
Concurrent access safety (asyncio.Lock)
Backward compatibility

Backward Compatibility

No breaking changes.

Default configuration disables throttling
Existing behavior remains unchanged unless explicitly enabled

Scope

This PR addresses request-rate-based throttling only.

It does not handle API quota exhaustion errors (insufficient_quota), which are unrelated to rate limiting logic.

essiebx added 2 commits May 29, 2026 10:38

Add token bucket rate limiter for unified LLM API throttling

bf7700f

refactor: replace singleton throttle with keyed registry

fdf883c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous Token Bucket Rate Limiter#35

Asynchronous Token Bucket Rate Limiter#35
essiebx wants to merge 2 commits into
microsoft:mainfrom
essiebx:feat/token-bucket-rate-limiter-v2

essiebx commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

essiebx commented May 29, 2026

Summary

Changes

1. AsyncTokenBucket Implementation

2. Process-Global Keyed Registry

3. Configuration Updates

4. Integration into Request Pipeline

5. Test Coverage

Backward Compatibility

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant