Skip to content

Asynchronous Token Bucket Rate Limiter#35

Open
essiebx wants to merge 2 commits into
microsoft:mainfrom
essiebx:feat/token-bucket-rate-limiter-v2
Open

Asynchronous Token Bucket Rate Limiter#35
essiebx wants to merge 2 commits into
microsoft:mainfrom
essiebx:feat/token-bucket-rate-limiter-v2

Conversation

@essiebx
Copy link
Copy Markdown

@essiebx essiebx commented May 29, 2026

Summary

This PR introduces an optional asynchronous token bucket rate limiter to smooth bursty LLM traffic and reduce request-rate-based 429 Too Many Requests errors.

The limiter is applied at a single choke point (BaseModel._post_with_retries) and is provider-agnostic across OpenAI, Anthropic, and OpenRouter.

When disabled (throttle_rate: 0.0, default), the implementation introduces no behavioral changes and zero runtime overhead.

Changes

1. AsyncTokenBucket Implementation

Location: src/webwright/utils/throttle.py

Provides an async-safe token bucket rate limiter that:

  • Enforces request rate limits before API calls
  • Uses time-based token refill logic
  • Avoids busy-waiting via asyncio.sleep

Purpose: Prevent burst traffic from reaching external APIs before rate limiting is applied.

2. Process-Global Keyed Registry

Introduced a registry mapping:

(rate, capacity) → AsyncTokenBucket

Purpose:

  • Ensures independent throttles for different model configurations
  • Shares a single bucket for identical configurations
  • Prevents cross-configuration interference within the same process

3. Configuration Updates

Added the following fields to BaseModelConfig:

  • throttle_rate (requests per second)
  • throttle_capacity (burst size)

Default:

  • throttle_rate: 0.0

Purpose: Fully opt-in behavior with no overhead when disabled.

4. Integration into Request Pipeline

Throttling is enforced inside:

BaseModel._post_with_retries()

Before each external API request:

await bucket.acquire()

Purpose: Ensures all provider requests pass through a single unified rate-limiting choke point.

5. Test Coverage

Location: tests/unit/test_throttle.py

Added 15 unit tests covering:

  • Rate and capacity validation
  • Burst behavior
  • Token refill correctness
  • Blocking behavior under exhaustion
  • Registry reuse per configuration
  • Concurrent access safety (asyncio.Lock)
  • Backward compatibility

Backward Compatibility

No breaking changes.

  • Default configuration disables throttling
  • Existing behavior remains unchanged unless explicitly enabled

Scope

This PR addresses request-rate-based throttling only.

It does not handle API quota exhaustion errors (insufficient_quota), which are unrelated to rate limiting logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant