Skip to content

feat: chaos for proxy mode (drop, disconnect, malformed)#122

Open
marthakelly wants to merge 13 commits intomainfrom
feat/chaos-proxy-phase2-v2
Open

feat: chaos for proxy mode (drop, disconnect, malformed)#122
marthakelly wants to merge 13 commits intomainfrom
feat/chaos-proxy-phase2-v2

Conversation

@marthakelly
Copy link
Copy Markdown

@marthakelly marthakelly commented Apr 22, 2026

Summary

Chaos (drop, disconnect, malformed) now applies in proxy mode. Previously chaos was skipped for proxied requests, making them unrealistically reliable.

What it does

  • Pre-flight drop/disconnect pre-empt upstream (nothing is contacted).
  • Post-response malformed proxies to upstream, captures the response, replaces the body with invalid JSON before relay.
  • Chaos is rolled once per request, after fixture match, with resolved config (headers > fixture.chaos > server defaults).

Design highlights

  • applyChaos split into roll+dispatch (applyChaosAction) so callers that pre-roll can't re-roll by accident.
  • proxyAndRecord returns "not_configured" | "relayed" | "handled_by_hook"
  • JournalEntry.response.source: "fixture" | "proxy" disambiguates null-fixture entries.
  • aimock_chaos_triggered_total metric is tagged by both action and source.

Scope

In: malformed / drop / disconnect for proxy mode, non-streaming.
Deferred: status_override, truncate, streaming (SSE/NDJSON) mutation.

Tests

  • Post-response malformed: upstream IS called, body is invalid JSON, journal records chaosAction exactly once.
  • Single-roll semantics pinned: drop journals the matched fixture + increments match count (disconnect symmetric).
  • Binary-safe hook: beforeWriteResponse receives raw upstream bytes byte-equal to source.
  • Proxy failure integration: 502 reaches client, journal correct, chaos rolled-but-upstream-failed journals no chaosAction.
  • Content-type preserved on no-chaos replay.

pnpm format:checkpnpm lintpnpm test ✓ (2445 passed)

@marthakelly marthakelly changed the base branch from feat/chaos-proxy-phase1 to main April 22, 2026 20:51
@marthakelly marthakelly changed the title feat: post-response chaos for proxy mode (phase 2) feat: chaos for proxy mode (drop, disconnect, malformed) Apr 22, 2026
Previously, chaos in proxy mode only supported pre-flight actions — drop
and disconnect fired before upstream was ever contacted. Malformed was
evaluated pre-flight too, so it short-circuited the proxy call and never
actually exercised the upstream path. This landed malformed as a
post-response mutation so the proxy does hit upstream, captures the
response, and then (probabilistically) replaces the body with invalid
JSON before relay.

Design:

- Split applyChaos into a roll+dispatch pair. applyChaosAction dispatches
  a known action without re-rolling the dice, so callers that pre-roll
  to decide phase-appropriate dispatch can't drift from their own
  decision.
- Gate pre-flight chaos in handleCompletions to drop/disconnect only.
  Malformed flows through to either fixture post-match or proxy
  post-response.
- Gate post-match applyChaos on fixture presence (previously ran
  unconditionally). Removes a stale roll on the no-fixture/proxy path.
- Add an optional beforeWriteResponse hook to proxyAndRecord. The hook
  receives the captured upstream response and can either relay it
  normally (return false) or replace it (return true). Non-breaking for
  the other nine proxyAndRecord callers — options defaults to undefined.
- handleCompletions uses the hook to apply post-response malformed
  chaos and journal it, with a chaosHandled flag that suppresses the
  outer journal.add so the request records exactly one entry.

Tests (red/green verified — malformed test fails on pre-refactor code
because upstream is never called, then passes after the changes):

- applies malformed chaos AFTER proxying (upstream called, body
  corrupted, journaled exactly once with chaosAction).
- preserves upstream content-type on replay when no chaos fires.

All 2440 existing tests continue to pass.
Phase 1 noted that pre-flight + post-match evaluation was a tradeoff
to preserve fixture-level chaos overrides without a larger refactor,
and that the double-evaluation would be removed in phase 2. This lands
that change.

Match the fixture first, then roll chaos once with the resolved config
(headers > fixture.chaos > server defaults). Dispatch by action + path:
drop/disconnect apply immediately; malformed on the fixture path writes
invalid JSON instead of the fixture response; malformed on the proxy
path goes through the beforeWriteResponse hook, which now closes over
the pre-rolled action rather than re-evaluating.

Observable behavior change: when drop/disconnect fires on a request
that matches a fixture, the journal entry now records the matched
fixture (not null) and the fixture's match count is incremented. Under
the old double-roll, whether the fixture was recorded depended on
which phase rolled first — an inconsistency that's now gone.

New test pins the single-roll semantics via the journal entry's
fixture field on a drop.
Review feedback on the phase-2 draft surfaced four legitimate issues;
this addresses them by making the API boundaries more explicit.

1. chaosHandled ghost flag → ProxyOutcome return type.
   proxyAndRecord previously returned boolean, so the server.ts caller
   had to share a mutated closure variable with the beforeWriteResponse
   hook to know whether the hook already journaled. Replaced boolean
   with "not_configured" | "relayed" | "handled_by_hook". The caller
   pattern-matches on the outcome instead of reading a flag, and the
   other thirteen proxyAndRecord callers (bedrock, cohere, ollama,
   speech, images, video, gemini, messages, embeddings, transcription,
   responses, bedrock-converse, multimedia-record test) are updated
   mechanically from `if (proxied)` to `if (outcome !== "not_configured")`.
   Those callers were also latently hazardous — any future refactor
   returning a truthy non-boolean would silently break them.

2. Logic leakage in the chaos hook → caller gates hook existence.
   Previously the hook did `if (chaosAction !== "malformed") return false`
   as its first line. Now the hook is only passed when the action is
   malformed, so inside the hook it can unconditionally apply + return
   true. Decision lives at the call site, executor lives in the hook.

3. isBinary dead code → removed from ProxyCapturedResponse.
   Was added preemptively, never read by the one hook that consumes
   the captured response. YAGNI.

4. Journal source ambiguity → added response.source: "fixture" | "proxy".
   A null fixture in a chaos journal entry was ambiguous (proxy path?
   unmatched 404?). applyChaosAction now requires a source argument
   (required, not optional — TypeScript won't let a branch silently
   omit it). The applyChaos wrapper used by the ten non-chat handlers
   defaults to "fixture" since those callers only apply chaos in
   fixture-serving contexts. Chat completions passes the appropriate
   source at every site.

Not addressed:
- Hardcoded Content-Type: application/json in applyChaosAction's
  malformed branch. For the current PR scope (chat completions, always
  JSON upstream) this is correct; for phase 3 (streaming / non-JSON)
  the dispatch will need to branch on content type.
- Full journal schema consistency (source field on every handler's
  journal.add path, not just chaos entries). Non-chat handlers set
  source via applyChaos's "fixture" default; their non-chaos journal
  paths stay unchanged here.

All 2441 tests pass (previous 2440 + one pinning the single-roll
semantics). Format + lint clean.
Two small points from review:

1. The chaos hook in server.ts declared a `response` parameter it
   didn't use. Dropped the parameter entirely rather than prefixing
   with `_` (the project's ESLint config doesn't honor that pattern).
   TypeScript allows a callback to declare fewer params than its
   signature type, which is itself a valid signal that the hook
   doesn't consume the response. A comment notes where phase 3 will
   need it.

2. Added a proxyAndRecord unit test that stands up a real HTTP
   upstream returning non-UTF8 bytes (0xff 0xfe 0xfd ...) and
   verifies the `beforeWriteResponse` hook receives a Buffer
   byte-equal to the upstream response. Pins the refactor's claim
   that the hook sees raw upstream bytes rather than a
   UTF-8-decoded-then-re-encoded view.
Unit tests in recorder.test.ts prove proxyAndRecord writes 502 on
upstream failure. These two integration tests pin the corresponding
server-level behavior:

1. Proxy failure (no chaos): client gets 502, journal records one
   entry with status=502, fixture=null, source="proxy",
   chaosAction=undefined. Confirms handleCompletions's "relayed"
   branch journals the failure correctly and doesn't hang.

2. Proxy failure with malformedRate=1.0: client still gets 502
   (not a malformed-JSON body), journal has no chaosAction. Pins
   that when upstream fails before the hook is invoked, chaos is
   "rolled but never fired" — the journal reflects what happened,
   not what was intended.
Four small changes from review:

1. Tag aimock_chaos_triggered_total with source. Observability wanted
   to see "N% of chaos fires are fixture vs proxy"; the label was
   already plumbed through applyChaosAction, just wasn't on the
   metric itself.

2. Comment at the collapse/hook site in recorder.ts noting that the
   upstream response is buffered in memory. Fine for chat traffic;
   revisit if this path ever proxies large or long-lived streams.

3. Comment at the fixture-write in recorder.ts clarifying that the
   persisted artifact is always the real upstream response, even
   when chaos later mutates the relay. Chaos is a runtime
   decoration; the recorded artifact must stay truthful so replay
   sees what upstream said.

4. Disconnect journal-pinning test in chaos-fixture-mode.test.ts,
   symmetric to the existing drop test. Disconnect's status: 0 is
   an unusual shape worth a regression pin.
@marthakelly marthakelly force-pushed the feat/chaos-proxy-phase2-v2 branch from 12abf48 to 696522a Compare April 23, 2026 01:21
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 23, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@copilotkit/aimock@122

commit: 696522a

@marthakelly marthakelly marked this pull request as ready for review April 23, 2026 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant