fix(ai-cache): bypass caching for prompts carrying non-text content#13652
fix(ai-cache): bypass caching for prompts carrying non-text content#13652AlinsRan wants to merge 2 commits into
Conversation
Since apache#13634 the protocol get_messages() flattens structured message content to plain text, dropping image/audio/other typed parts. ai-cache consumed that flattened view, so a text+image prompt became indistinguishable from a text-only one: they collided on the same L1 exact key and on the same L2 semantic cell, letting a multimodal request be wrongly served a text-only cached response (X-AI-Cache-Status: HIT). Detect non-text content from the RAW request body (before flattening) and bypass the cache entirely for such a request -- read nothing, write nothing, report a MISS -- so it always proceeds to the upstream. This restores the multimodal-bypass invariant guarded by t/plugin/ai-cache-semantic.t TEST 62-64.
|
I do not think this is ready to merge yet. The OpenAI Chat regression fix is headed in the right direction, but the new raw-body detector still misses non-text content for an already supported protocol: Bedrock Converse.
apisix/apisix/plugins/ai-cache/semantic.lua Lines 138 to 148 in 0339f43 However, Bedrock Converse content blocks are flattened by reading only apisix/apisix/plugins/ai-protocols/bedrock-converse.lua Lines 152 to 160 in 0339f43 apisix/apisix/plugins/ai-protocols/bedrock-converse.lua Lines 232 to 258 in 0339f43 At the same time, the L1 fingerprint excludes raw apisix/apisix/plugins/ai-cache/key.lua Lines 63 to 67 in 0339f43 Suggested fix: make the raw-body non-text detector protocol-aware, or at least treat Bedrock content blocks that are not pure |
body_has_nontext() only recognised OpenAI-style typed parts (type ~= "text"), so Bedrock Converse content blocks -- which mark non-text modalities with a distinct key (image/document/toolUse/...) and no `type` field -- slipped through. get_messages() then flattened them to plain text, letting a Bedrock text+image request collide with a text-only one on the same L1 exact key. Treat a block as text only when it carries a string `text` and no non-text discriminator, covering both the OpenAI `type` tag and the Bedrock key-per-modality shape. Adds a Bedrock exact-cache regression (TEST 66-68).
|
Good catch. Bedrock Converse blocks use Reworked the detector to be shape-agnostic: a block counts as text only if it has a string The access-phase guard bypasses before the fingerprint is computed, so this raw-body detection also covers the L1 canonicalization gap you pointed at — no key.lua change needed. |
Description
https://github.com/apache/apisix/actions/runs/28582602012/job/84746279063
Since #13634 the protocol layer's
get_messages()flattens structured message content to plain text, dropping image / audio / other non-text typed parts.ai-cacheconsumed that flattened view, so atext + imageprompt became indistinguishable from a text-only one:text+imagecollapses to the same key as the text-only prompt → cross-modal L1 collision.window_has_nontext()inspected the flattened messages (contentis already a plain string), so it could never detect the image → the multimodal request was embedded and matched the text-only vector.Either way a multimodal request was wrongly served a text-only cached response (
X-AI-Cache-Status: HIT).#13634did not touchai-cache, so this regressed silently on master.t/plugin/ai-cache-semantic.tTEST 64 fails:got 'HIT', expected 'MISS'.Fix
Detect non-text content from the raw request body (before
get_messages()flattens it) and bypass the cache entirely for such a request: read nothing, write nothing (fingerprint left unset solog()skips the write-back), and report aMISSso it always proceeds to the upstream. This restores the multimodal-bypass invariant guarded by TEST 62-64. The semantic layer's own check is switched to the same raw-body detector as a second line of defence.A prompt that carries non-text content cannot be faithfully keyed while the canonical form is text-only, so not caching it is the safe behaviour.
Verification
t/plugin/ai-cache-semantic.t: 212/212 pass (was 1 failing — TEST 64).got 'HIT', expected 'MISS'.luacheckclean on both changed files.Checklist