fix(ai-cache): bypass caching for prompts carrying non-text content by AlinsRan · Pull Request #13652 · apache/apisix

AlinsRan · 2026-07-03T00:31:33Z

Description

https://github.com/apache/apisix/actions/runs/28582602012/job/84746279063

Since #13634 the protocol layer's get_messages() flattens structured message content to plain text, dropping image / audio / other non-text typed parts. ai-cache consumed that flattened view, so a text + image prompt became indistinguishable from a text-only one:

L1 (exact): the fingerprint is built from the flattened messages, so text+image collapses to the same key as the text-only prompt → cross-modal L1 collision.
L2 (semantic): window_has_nontext() inspected the flattened messages (content is already a plain string), so it could never detect the image → the multimodal request was embedded and matched the text-only vector.

Either way a multimodal request was wrongly served a text-only cached response (X-AI-Cache-Status: HIT).

#13634 did not touch ai-cache, so this regressed silently on master. t/plugin/ai-cache-semantic.t TEST 64 fails: got 'HIT', expected 'MISS'.

Fix

Detect non-text content from the raw request body (before get_messages() flattens it) and bypass the cache entirely for such a request: read nothing, write nothing (fingerprint left unset so log() skips the write-back), and report a MISS so it always proceeds to the upstream. This restores the multimodal-bypass invariant guarded by TEST 62-64. The semantic layer's own check is switched to the same raw-body detector as a second line of defence.

A prompt that carries non-text content cannot be faithfully keyed while the canonical form is text-only, so not caching it is the safe behaviour.

Verification

t/plugin/ai-cache-semantic.t: 212/212 pass (was 1 failing — TEST 64).
Reverting the fix reproduces exactly got 'HIT', expected 'MISS'.
luacheck clean on both changed files.

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change (existing TEST 62-64 now pass)
I have updated the documentation to reflect this change (N/A — behaviour fix)
I have verified that this change is backward compatible

Since apache#13634 the protocol get_messages() flattens structured message content to plain text, dropping image/audio/other typed parts. ai-cache consumed that flattened view, so a text+image prompt became indistinguishable from a text-only one: they collided on the same L1 exact key and on the same L2 semantic cell, letting a multimodal request be wrongly served a text-only cached response (X-AI-Cache-Status: HIT). Detect non-text content from the RAW request body (before flattening) and bypass the cache entirely for such a request -- read nothing, write nothing, report a MISS -- so it always proceeds to the upstream. This restores the multimodal-bypass invariant guarded by t/plugin/ai-cache-semantic.t TEST 62-64.

membphis · 2026-07-03T04:30:40Z

I do not think this is ready to merge yet.

The OpenAI Chat regression fix is headed in the right direction, but the new raw-body detector still misses non-text content for an already supported protocol: Bedrock Converse.

body_has_nontext() currently only returns true when a content block has a type field and that value is not text:

apisix/apisix/plugins/ai-cache/semantic.lua

Lines 138 to 148 in 0339f43

    
           function _M.body_has_nontext(body) 
        
               local messages = body and body.messages 
        
               if type(messages) ~= "table" then 
        
                   return false 
        
               end 
        
               for _, msg in ipairs(messages) do 
        
                   local content = type(msg) == "table" and msg.content 
        
                   if type(content) == "table" then 
        
                       for _, block in ipairs(content) do 
        
                           if type(block) == "table" and block.type and block.type ~= "text" then 
        
                               return true

However, Bedrock Converse content blocks are flattened by reading only block.text, and non-text blocks such as image/document blocks do not need to use this OpenAI-style type field:

apisix/apisix/plugins/ai-protocols/bedrock-converse.lua

Lines 152 to 160 in 0339f43

    
           -- Append the text of each Bedrock content block ({text = "..."}) into `texts`. 
        
           local function append_block_texts(texts, blocks) 
        
               if type(blocks) ~= "table" then 
        
                   return 
        
               end 
        
               for _, block in ipairs(blocks) do 
        
                   if type(block) == "table" and type(block.text) == "string" then 
        
                       core.table.insert(texts, block.text) 
        
                   end

apisix/apisix/plugins/ai-protocols/bedrock-converse.lua

Lines 232 to 258 in 0339f43

    
           --- Get messages in canonical {role, content} format. 
        
           -- Bedrock content blocks [{text: "..."}] are flattened to plain text. 
        
           function _M.get_messages(body) 
        
               local messages = {} 
        
               local system_texts = {} 
        
               append_block_texts(system_texts, body.system) 
        
               if #system_texts > 0 then 
        
                   core.table.insert(messages, { 
        
                       role = "system", 
        
                       content = table.concat(system_texts, " "), 
        
                   }) 
        
               end 
        
               if type(body.messages) == "table" then 
        
                   for _, message in ipairs(body.messages) do 
        
                       if type(message) == "table" then 
        
                           local texts = {} 
        
                           append_block_texts(texts, message.content) 
        
                           if #texts > 0 then 
        
                               core.table.insert(messages, { 
        
                                   role = message.role, 
        
                                   content = table.concat(texts, " "), 
        
                               }) 
        
                           end 
        
                       end 
        
                   end 
        
               end 
        
               return messages

At the same time, the L1 fingerprint excludes raw messages from params, so a Bedrock text-only request and a text+image/text+document request can still canonicalize to the same cache key:

apisix/apisix/plugins/ai-cache/key.lua

Lines 63 to 67 in 0339f43

    
           local params = {} 
        
           for k, v in pairs(body) do 
        
               if k ~= "messages" and k ~= "model" and k ~= "stream" then 
        
                   params[k] = v 
        
               end

Suggested fix: make the raw-body non-text detector protocol-aware, or at least treat Bedrock content blocks that are not pure { text = ... } blocks as non-text, then add a Bedrock exact-cache regression test for text-only versus same text plus image/document.

body_has_nontext() only recognised OpenAI-style typed parts (type ~= "text"), so Bedrock Converse content blocks -- which mark non-text modalities with a distinct key (image/document/toolUse/...) and no `type` field -- slipped through. get_messages() then flattened them to plain text, letting a Bedrock text+image request collide with a text-only one on the same L1 exact key. Treat a block as text only when it carries a string `text` and no non-text discriminator, covering both the OpenAI `type` tag and the Bedrock key-per-modality shape. Adds a Bedrock exact-cache regression (TEST 66-68).

AlinsRan · 2026-07-03T06:19:33Z

Good catch. Bedrock Converse blocks use {image=...}/{document=...} with no type, so the OpenAI-only type ~= "text" check missed them and a Bedrock text+image request still collided with text-only at L1.

Reworked the detector to be shape-agnostic: a block counts as text only if it has a string text and no non-text discriminator, covering both the OpenAI type tag and Bedrock's key-per-modality blocks. Added a Bedrock exact-cache regression (TEST 66-68) — text-only warms L1, same text + image is a MISS; reverting the detector reproduces the HIT, so the test bites.

The access-phase guard bypasses before the fingerprint is computed, so this raw-body detection also covers the L1 canonicalization gap you pointed at — no key.lua change needed.

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Jul 3, 2026

nic-6443 previously approved these changes Jul 3, 2026

View reviewed changes

AlinsRan dismissed nic-6443’s stale review via 67392a3 July 3, 2026 06:15

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jul 3, 2026

AlinsRan closed this Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ai-cache): bypass caching for prompts carrying non-text content#13652

fix(ai-cache): bypass caching for prompts carrying non-text content#13652
AlinsRan wants to merge 2 commits into
apache:masterfrom
AlinsRan:fix/ai-cache-multimodal-bypass

AlinsRan commented Jul 3, 2026 •

edited

Loading

Uh oh!

membphis commented Jul 3, 2026

Uh oh!

AlinsRan commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

AlinsRan commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fix

Verification

Checklist

Uh oh!

membphis commented Jul 3, 2026

Uh oh!

AlinsRan commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlinsRan commented Jul 3, 2026 •

edited

Loading