buffr

Record API traffic once. Replay it forever.

A VCR-style record/replay proxy for HTTP, SSE and WebSocket APIs — language-agnostic,
drop-in for OpenAI, Anthropic or any upstream. First run records; every run after is
instant, free and deterministic.

Every test that hits a real API is a gamble: responses drift, rate limits kick in, latency spikes, tokens cost money. buffr puts a proxy in front of any upstream and records every call to a JSON cassette. After that, your suite runs against the cassette — zero latency, zero cost, zero flakiness. The app never knows the difference.

your app → buffr → api.openai.com   first run: records everything
your app → buffr                    every run after: replays from cassette

It's a proxy, not a library — works with Python, Go, Node, Rust, anything that speaks HTTP. No fixtures, no hand-rolled mocks; the cassette is the test data.

Modes

Mode	Behaviour
`auto`	Replay on hit, record on miss — cassette fills itself incrementally
`record`	Forward everything to upstream and write to cassette
`replay`	Serve from cassette only, no network

Quickstart

buffr auto --target https://api.openai.com --port 8080

Point your app at http://localhost:8080 instead of https://api.openai.com. Done. The cassette is auto-named api.openai.com.json in the current directory.

Docker

docker run \
  -e BUFFR_MODE=auto \
  -e BUFFR_TARGET=https://api.openai.com \
  -v ./cassettes:/data \
  -p 8080:8080 \
  ghcr.io/robinbially/buffr:latest

Multiple APIs, one container

BUFFR_TARGETS gives each upstream its own port and cassette. mode and cassette are optional per entry (default auto, <host>.json):

docker run \
  -e BUFFR_TARGETS='
    - target: https://api.openai.com
      port: 8081
    - target: https://api.anthropic.com
      port: 8082
    - target: https://api.elevenlabs.io
      port: 8083
  ' \
  -v ./cassettes:/data \
  -p 8081:8081 -p 8082:8082 -p 8083:8083 \
  ghcr.io/robinbially/buffr:latest

Forward-proxy mode (catch-all)

Reverse-proxy modes need a --target per upstream and a configurable base_url. That can't reach hosts hardcoded inside a library — huggingface.co downloads, vendor SDKs, wss://…/v1/realtime.

Forward-proxy mode intercepts everything the client routes through it — a transport-level recorder (like VCR.py) but language-agnostic. The client sets standard proxy env vars and trusts buffr's CA; buffr terminates TLS with on-the-fly leaf certs and records/replays per destination host. No base_url changes.

buffr proxy --auto --port 8080      # or --record / --replay
buffr ca > buffr-ca.pem             # export the CA cert for the client to trust

Client side — no app code changes (httpx, requests, aiohttp all honor these):

export HTTPS_PROXY=http://buffr:8080
export HTTP_PROXY=http://buffr:8080
export NO_PROXY=localhost,127.0.0.1,database,qdrant,s3
export SSL_CERT_FILE=/path/to/buffr-ca.pem         # httpx, requests, aiohttp
export REQUESTS_CA_BUNDLE=/path/to/buffr-ca.pem    # requests / older stacks

The CA is minted on first start and persisted (<data>/buffr-ca.pem + .key), so the client trusts it once. HTTP, SSE and WebSocket are all intercepted; binary and gzip/br bodies are stored byte-faithfully.

Limitation: cert-pinned or HSTS-preloaded SDKs reject any MITM cert by design — keep those on the reverse-proxy --target wiring. HTTP/2 is downgraded to HTTP/1.1 on the intercepted leg (egress to the real upstream may still use h2).

Per-host config (BUFFR_PROXY) and env vars

BUFFR_PROXY: |
  mode: auto                       # auto | record | replay
  bypass: [localhost, 127.0.0.1, database, qdrant, s3]   # tunneled, never recorded
  hosts:
    - host: inference-shared.homeport.ai
      cassette: /data/vllm.json
      match:
        ignore:                    # same rules as reverse mode, per host
          - in: request.body
            pattern: 'chatcmpl-[A-Za-z0-9]{16,32}'
            replace_with: '<CHATCMPL_ID>'
            sync_response: true
    - host: google.serper.dev
      cassette: /data/serper.json
    - host: '*'                    # fallback for any other host
      cassette: /data/misc.json

bypass — hosts (and subdomains) tunneled straight through, no TLS interception or recording, for infra/local services. The client's NO_PROXY is honored too.
Unlisted hosts fall back to '*', or — if absent — record to <data>/<host>.json. Unlisted is not the same as bypassed.
Matching keys on method + host + path + query + (normalized) body, so a shared cassette never cross-matches between hosts.

Env	Default	Purpose
`BUFFR_PROXY`	—	YAML: mode, bypass list, per-host cassette + `match.ignore`
`BUFFR_CA_CERT`	`<data>/buffr-ca.pem`	CA cert path (also what `buffr ca` prints)
`BUFFR_CA_KEY`	`<data>/buffr-ca.key`	CA private key path
`BUFFR_DATA_DIR`	`.`	base dir for default cassettes + CA

What gets recorded

Protocol	What buffr captures
🌐 HTTP	Request + response, any method, any path
⚡ SSE (`text/event-stream`)	Each chunk with original inter-chunk timing
🔌 WebSocket	Bidirectional frames in order, with per-frame delays

Cassettes are plain JSON — readable in diffs, editable by hand.

Configuration

All flags have environment variable equivalents; flags take precedence.

Flag	Env	Default
`--target`	`BUFFR_TARGET`	—
`--port`	`BUFFR_PORT`	`8080`
`--cassette`	`BUFFR_CASSETTE`	`<target-host>.json`
(subcommand)	`BUFFR_MODE`	—

Faster replays (BUFFR_REPLAY_NODELAY)

buffr records the wall-clock delay before each SSE chunk / WebSocket frame. By default these delays are dropped on replay so chunks/frames are emitted back-to-back — payloads are identical, only the inter-chunk timing is gone. This keeps replays fast instead of re-spending the original generation time (often seconds per call) on every run.

Set BUFFR_REPLAY_NODELAY=0 to reproduce the recorded cadence faithfully — do this when the streaming timing itself is under test.

Matching across non-deterministic requests (match.ignore)

When a request body or path carries per-run noise (a run ID, UUID, timestamp), no cassette entry ever matches and the hit rate collapses. match.ignore rewrites those substrings before matching — the same rule runs on the recorded and the live request, so they normalize to the same signature.

BUFFR_TARGETS: |
  - target: http://192.168.178.27:1234
    port: 8083
    mode: auto
    cassette: /data/lm-studio.json
    match:
      ignore:
        - in: request.body
          pattern: '/runs/\d{8}-\d{6}-\d{3}/'
          replace_with: '/runs/<RUN_ID>/'
          sync_response: true   # echo the live run_id back in the response
        - in: request.path
          pattern: '/tasks/[0-9a-f-]{36}'
          replace_with: '/tasks/<TASK_ID>'

in — request.body or request.path
pattern — Go regex (RE2)
replace_with — literal replacement (use a placeholder like <RUN_ID> for readability)
sync_response (default false) — when the upstream echoes the ID back, buffr records the matched value and swaps in the live request's value at replay time, so the client sees its own ID, not the frozen one.

Invalid rules log a warning and are skipped — they don't take the proxy down.

WebSocket example & log format

# Record once against the real API
import websocket
ws = websocket.create_connection("ws://localhost:8080/v1/realtime")
ws.send('{"type":"session.update","session":{"modalities":["text"]}}')
print(ws.recv())
ws.close()
# Replay in tests — same code, no network

Every request logs method, path, status, duration and source:

time=12:34:56.123 level=INFO msg=listening mode=auto addr=:8080 cassette=api.openai.com.json
time=12:34:57.045 level=INFO msg="POST /v1/chat/completions" status=200 dur=823ms src=upstream
time=12:34:58.891 level=INFO msg="POST /v1/chat/completions" status=200 dur=2ms   src=cassette
time=12:34:59.001 level=WARN msg="POST /v1/embeddings"                            src=miss
time=12:35:00.450 level=INFO msg="WS /v1/realtime"           frames=14 dur=3.2s   src=cassette

Development

go test ./...
go run ./cmd/buffr auto --target https://api.openai.com

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
cmd/buffr		cmd/buffr
internal		internal
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

buffr

Record API traffic once. Replay it forever.

Modes

Quickstart

Docker

Forward-proxy mode (catch-all)

What gets recorded

Configuration

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

buffr

Record API traffic once. Replay it forever.

Modes

Quickstart

Docker

Forward-proxy mode (catch-all)

What gets recorded

Configuration

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages