feat: chaos for proxy mode (drop, disconnect, malformed)#122
Open
marthakelly wants to merge 13 commits intomainfrom
Open
feat: chaos for proxy mode (drop, disconnect, malformed)#122marthakelly wants to merge 13 commits intomainfrom
marthakelly wants to merge 13 commits intomainfrom
Conversation
Previously, chaos in proxy mode only supported pre-flight actions — drop and disconnect fired before upstream was ever contacted. Malformed was evaluated pre-flight too, so it short-circuited the proxy call and never actually exercised the upstream path. This landed malformed as a post-response mutation so the proxy does hit upstream, captures the response, and then (probabilistically) replaces the body with invalid JSON before relay. Design: - Split applyChaos into a roll+dispatch pair. applyChaosAction dispatches a known action without re-rolling the dice, so callers that pre-roll to decide phase-appropriate dispatch can't drift from their own decision. - Gate pre-flight chaos in handleCompletions to drop/disconnect only. Malformed flows through to either fixture post-match or proxy post-response. - Gate post-match applyChaos on fixture presence (previously ran unconditionally). Removes a stale roll on the no-fixture/proxy path. - Add an optional beforeWriteResponse hook to proxyAndRecord. The hook receives the captured upstream response and can either relay it normally (return false) or replace it (return true). Non-breaking for the other nine proxyAndRecord callers — options defaults to undefined. - handleCompletions uses the hook to apply post-response malformed chaos and journal it, with a chaosHandled flag that suppresses the outer journal.add so the request records exactly one entry. Tests (red/green verified — malformed test fails on pre-refactor code because upstream is never called, then passes after the changes): - applies malformed chaos AFTER proxying (upstream called, body corrupted, journaled exactly once with chaosAction). - preserves upstream content-type on replay when no chaos fires. All 2440 existing tests continue to pass.
Phase 1 noted that pre-flight + post-match evaluation was a tradeoff to preserve fixture-level chaos overrides without a larger refactor, and that the double-evaluation would be removed in phase 2. This lands that change. Match the fixture first, then roll chaos once with the resolved config (headers > fixture.chaos > server defaults). Dispatch by action + path: drop/disconnect apply immediately; malformed on the fixture path writes invalid JSON instead of the fixture response; malformed on the proxy path goes through the beforeWriteResponse hook, which now closes over the pre-rolled action rather than re-evaluating. Observable behavior change: when drop/disconnect fires on a request that matches a fixture, the journal entry now records the matched fixture (not null) and the fixture's match count is incremented. Under the old double-roll, whether the fixture was recorded depended on which phase rolled first — an inconsistency that's now gone. New test pins the single-roll semantics via the journal entry's fixture field on a drop.
Review feedback on the phase-2 draft surfaced four legitimate issues; this addresses them by making the API boundaries more explicit. 1. chaosHandled ghost flag → ProxyOutcome return type. proxyAndRecord previously returned boolean, so the server.ts caller had to share a mutated closure variable with the beforeWriteResponse hook to know whether the hook already journaled. Replaced boolean with "not_configured" | "relayed" | "handled_by_hook". The caller pattern-matches on the outcome instead of reading a flag, and the other thirteen proxyAndRecord callers (bedrock, cohere, ollama, speech, images, video, gemini, messages, embeddings, transcription, responses, bedrock-converse, multimedia-record test) are updated mechanically from `if (proxied)` to `if (outcome !== "not_configured")`. Those callers were also latently hazardous — any future refactor returning a truthy non-boolean would silently break them. 2. Logic leakage in the chaos hook → caller gates hook existence. Previously the hook did `if (chaosAction !== "malformed") return false` as its first line. Now the hook is only passed when the action is malformed, so inside the hook it can unconditionally apply + return true. Decision lives at the call site, executor lives in the hook. 3. isBinary dead code → removed from ProxyCapturedResponse. Was added preemptively, never read by the one hook that consumes the captured response. YAGNI. 4. Journal source ambiguity → added response.source: "fixture" | "proxy". A null fixture in a chaos journal entry was ambiguous (proxy path? unmatched 404?). applyChaosAction now requires a source argument (required, not optional — TypeScript won't let a branch silently omit it). The applyChaos wrapper used by the ten non-chat handlers defaults to "fixture" since those callers only apply chaos in fixture-serving contexts. Chat completions passes the appropriate source at every site. Not addressed: - Hardcoded Content-Type: application/json in applyChaosAction's malformed branch. For the current PR scope (chat completions, always JSON upstream) this is correct; for phase 3 (streaming / non-JSON) the dispatch will need to branch on content type. - Full journal schema consistency (source field on every handler's journal.add path, not just chaos entries). Non-chat handlers set source via applyChaos's "fixture" default; their non-chaos journal paths stay unchanged here. All 2441 tests pass (previous 2440 + one pinning the single-roll semantics). Format + lint clean.
Two small points from review: 1. The chaos hook in server.ts declared a `response` parameter it didn't use. Dropped the parameter entirely rather than prefixing with `_` (the project's ESLint config doesn't honor that pattern). TypeScript allows a callback to declare fewer params than its signature type, which is itself a valid signal that the hook doesn't consume the response. A comment notes where phase 3 will need it. 2. Added a proxyAndRecord unit test that stands up a real HTTP upstream returning non-UTF8 bytes (0xff 0xfe 0xfd ...) and verifies the `beforeWriteResponse` hook receives a Buffer byte-equal to the upstream response. Pins the refactor's claim that the hook sees raw upstream bytes rather than a UTF-8-decoded-then-re-encoded view.
Unit tests in recorder.test.ts prove proxyAndRecord writes 502 on upstream failure. These two integration tests pin the corresponding server-level behavior: 1. Proxy failure (no chaos): client gets 502, journal records one entry with status=502, fixture=null, source="proxy", chaosAction=undefined. Confirms handleCompletions's "relayed" branch journals the failure correctly and doesn't hang. 2. Proxy failure with malformedRate=1.0: client still gets 502 (not a malformed-JSON body), journal has no chaosAction. Pins that when upstream fails before the hook is invoked, chaos is "rolled but never fired" — the journal reflects what happened, not what was intended.
Four small changes from review: 1. Tag aimock_chaos_triggered_total with source. Observability wanted to see "N% of chaos fires are fixture vs proxy"; the label was already plumbed through applyChaosAction, just wasn't on the metric itself. 2. Comment at the collapse/hook site in recorder.ts noting that the upstream response is buffered in memory. Fine for chat traffic; revisit if this path ever proxies large or long-lived streams. 3. Comment at the fixture-write in recorder.ts clarifying that the persisted artifact is always the real upstream response, even when chaos later mutates the relay. Chaos is a runtime decoration; the recorded artifact must stay truthful so replay sees what upstream said. 4. Disconnect journal-pinning test in chaos-fixture-mode.test.ts, symmetric to the existing drop test. Disconnect's status: 0 is an unusual shape worth a regression pin.
12abf48 to
696522a
Compare
commit: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Chaos (drop, disconnect, malformed) now applies in proxy mode. Previously chaos was skipped for proxied requests, making them unrealistically reliable.
What it does
Design highlights
applyChaossplit into roll+dispatch (applyChaosAction) so callers that pre-roll can't re-roll by accident.proxyAndRecordreturns"not_configured" | "relayed" | "handled_by_hook"JournalEntry.response.source: "fixture" | "proxy"disambiguates null-fixture entries.aimock_chaos_triggered_totalmetric is tagged by bothactionandsource.Scope
In: malformed / drop / disconnect for proxy mode, non-streaming.
Deferred:
status_override,truncate, streaming (SSE/NDJSON) mutation.Tests
beforeWriteResponsereceives raw upstream bytes byte-equal to source.pnpm format:check✓pnpm lint✓pnpm test✓ (2445 passed)