A VCR-style record/replay proxy for HTTP, SSE and WebSocket APIs — language-agnostic,
drop-in for OpenAI, Anthropic or any upstream. First run records; every run after is
instant, free and deterministic.
Every test that hits a real API is a gamble: responses drift, rate limits kick in, latency spikes, tokens cost money. buffr puts a proxy in front of any upstream and records every call to a JSON cassette. After that, your suite runs against the cassette — zero latency, zero cost, zero flakiness. The app never knows the difference.
your app → buffr → api.openai.com first run: records everything
your app → buffr every run after: replays from cassette
It's a proxy, not a library — works with Python, Go, Node, Rust, anything that speaks HTTP. No fixtures, no hand-rolled mocks; the cassette is the test data.
| Mode | Behaviour |
|---|---|
auto |
Replay on hit, record on miss — cassette fills itself incrementally |
record |
Forward everything to upstream and write to cassette |
replay |
Serve from cassette only, no network |
buffr auto --target https://api.openai.com --port 8080Point your app at http://localhost:8080 instead of https://api.openai.com. Done. The cassette is auto-named api.openai.com.json in the current directory.
docker run \
-e BUFFR_MODE=auto \
-e BUFFR_TARGET=https://api.openai.com \
-v ./cassettes:/data \
-p 8080:8080 \
ghcr.io/robinbially/buffr:latestMultiple APIs, one container
BUFFR_TARGETS gives each upstream its own port and cassette. mode and cassette are optional per entry (default auto, <host>.json):
docker run \
-e BUFFR_TARGETS='
- target: https://api.openai.com
port: 8081
- target: https://api.anthropic.com
port: 8082
- target: https://api.elevenlabs.io
port: 8083
' \
-v ./cassettes:/data \
-p 8081:8081 -p 8082:8082 -p 8083:8083 \
ghcr.io/robinbially/buffr:latestReverse-proxy modes need a --target per upstream and a configurable base_url. That can't reach hosts hardcoded inside a library — huggingface.co downloads, vendor SDKs, wss://…/v1/realtime.
Forward-proxy mode intercepts everything the client routes through it — a transport-level recorder (like VCR.py) but language-agnostic. The client sets standard proxy env vars and trusts buffr's CA; buffr terminates TLS with on-the-fly leaf certs and records/replays per destination host. No base_url changes.
buffr proxy --auto --port 8080 # or --record / --replay
buffr ca > buffr-ca.pem # export the CA cert for the client to trustClient side — no app code changes (httpx, requests, aiohttp all honor these):
export HTTPS_PROXY=http://buffr:8080
export HTTP_PROXY=http://buffr:8080
export NO_PROXY=localhost,127.0.0.1,database,qdrant,s3
export SSL_CERT_FILE=/path/to/buffr-ca.pem # httpx, requests, aiohttp
export REQUESTS_CA_BUNDLE=/path/to/buffr-ca.pem # requests / older stacksThe CA is minted on first start and persisted (<data>/buffr-ca.pem + .key), so the client trusts it once. HTTP, SSE and WebSocket are all intercepted; binary and gzip/br bodies are stored byte-faithfully.
Limitation: cert-pinned or HSTS-preloaded SDKs reject any MITM cert by design — keep those on the reverse-proxy
--targetwiring. HTTP/2 is downgraded to HTTP/1.1 on the intercepted leg (egress to the real upstream may still use h2).
Per-host config (BUFFR_PROXY) and env vars
BUFFR_PROXY: |
mode: auto # auto | record | replay
bypass: [localhost, 127.0.0.1, database, qdrant, s3] # tunneled, never recorded
hosts:
- host: inference-shared.homeport.ai
cassette: /data/vllm.json
match:
ignore: # same rules as reverse mode, per host
- in: request.body
pattern: 'chatcmpl-[A-Za-z0-9]{16,32}'
replace_with: '<CHATCMPL_ID>'
sync_response: true
- host: google.serper.dev
cassette: /data/serper.json
- host: '*' # fallback for any other host
cassette: /data/misc.jsonbypass— hosts (and subdomains) tunneled straight through, no TLS interception or recording, for infra/local services. The client'sNO_PROXYis honored too.- Unlisted hosts fall back to
'*', or — if absent — record to<data>/<host>.json. Unlisted is not the same as bypassed. - Matching keys on method + host + path + query + (normalized) body, so a shared cassette never cross-matches between hosts.
| Env | Default | Purpose |
|---|---|---|
BUFFR_PROXY |
— | YAML: mode, bypass list, per-host cassette + match.ignore |
BUFFR_CA_CERT |
<data>/buffr-ca.pem |
CA cert path (also what buffr ca prints) |
BUFFR_CA_KEY |
<data>/buffr-ca.key |
CA private key path |
BUFFR_DATA_DIR |
. |
base dir for default cassettes + CA |
| Protocol | What buffr captures |
|---|---|
| 🌐 HTTP | Request + response, any method, any path |
⚡ SSE (text/event-stream) |
Each chunk with original inter-chunk timing |
| 🔌 WebSocket | Bidirectional frames in order, with per-frame delays |
Cassettes are plain JSON — readable in diffs, editable by hand.
All flags have environment variable equivalents; flags take precedence.
| Flag | Env | Default |
|---|---|---|
--target |
BUFFR_TARGET |
— |
--port |
BUFFR_PORT |
8080 |
--cassette |
BUFFR_CASSETTE |
<target-host>.json |
| (subcommand) | BUFFR_MODE |
— |
Faster replays (BUFFR_REPLAY_NODELAY)
buffr records the wall-clock delay before each SSE chunk / WebSocket frame. By default these delays are dropped on replay so chunks/frames are emitted back-to-back — payloads are identical, only the inter-chunk timing is gone. This keeps replays fast instead of re-spending the original generation time (often seconds per call) on every run.
Set BUFFR_REPLAY_NODELAY=0 to reproduce the recorded cadence faithfully — do this when the streaming timing itself is under test.
Matching across non-deterministic requests (match.ignore)
When a request body or path carries per-run noise (a run ID, UUID, timestamp), no cassette entry ever matches and the hit rate collapses. match.ignore rewrites those substrings before matching — the same rule runs on the recorded and the live request, so they normalize to the same signature.
BUFFR_TARGETS: |
- target: http://192.168.178.27:1234
port: 8083
mode: auto
cassette: /data/lm-studio.json
match:
ignore:
- in: request.body
pattern: '/runs/\d{8}-\d{6}-\d{3}/'
replace_with: '/runs/<RUN_ID>/'
sync_response: true # echo the live run_id back in the response
- in: request.path
pattern: '/tasks/[0-9a-f-]{36}'
replace_with: '/tasks/<TASK_ID>'in—request.bodyorrequest.pathpattern— Go regex (RE2)replace_with— literal replacement (use a placeholder like<RUN_ID>for readability)sync_response(default false) — when the upstream echoes the ID back, buffr records the matched value and swaps in the live request's value at replay time, so the client sees its own ID, not the frozen one.
Invalid rules log a warning and are skipped — they don't take the proxy down.
WebSocket example & log format
# Record once against the real API
import websocket
ws = websocket.create_connection("ws://localhost:8080/v1/realtime")
ws.send('{"type":"session.update","session":{"modalities":["text"]}}')
print(ws.recv())
ws.close()
# Replay in tests — same code, no networkEvery request logs method, path, status, duration and source:
time=12:34:56.123 level=INFO msg=listening mode=auto addr=:8080 cassette=api.openai.com.json
time=12:34:57.045 level=INFO msg="POST /v1/chat/completions" status=200 dur=823ms src=upstream
time=12:34:58.891 level=INFO msg="POST /v1/chat/completions" status=200 dur=2ms src=cassette
time=12:34:59.001 level=WARN msg="POST /v1/embeddings" src=miss
time=12:35:00.450 level=INFO msg="WS /v1/realtime" frames=14 dur=3.2s src=cassette
go test ./...
go run ./cmd/buffr auto --target https://api.openai.comMIT