Skip to content

fix(ai-cache): preserve non-text content in exact key and L2 bypass#13654

Merged
shreemaan-abhishek merged 3 commits into
apache:masterfrom
shreemaan-abhishek:fix/ai-cache-multimodal-key
Jul 3, 2026
Merged

fix(ai-cache): preserve non-text content in exact key and L2 bypass#13654
shreemaan-abhishek merged 3 commits into
apache:masterfrom
shreemaan-abhishek:fix/ai-cache-multimodal-key

Conversation

@shreemaan-abhishek

Copy link
Copy Markdown
Contributor

Description

ai-protocols' get_messages() flattens structured message content (images, tool calls) down to plain text. That silently broke ai-cache fidelity at both cache layers:

  1. L1 exact keykey.fingerprint hashes the get_messages output. After flattening, a [{text}, {image_url}] prompt and a plain-text prompt carrying the same text collapse to the identical string, so they collide on one L1 key and the multimodal request is wrongly served the text-only response.
  2. L2 semantic bypasswindow_has_nontext only tripped on type(content) == "table", which no longer exists after flattening, so multimodal prompts stopped bypassing the semantic layer and could hit a same-text vector.

Fix

  • key.lua — fold the raw body.messages into the L1 fingerprint so a text+image prompt stays distinct from a text-only one. (params already preserves every other body field verbatim, so input-based protocols were never affected; only messages lost fidelity.)
  • semantic.lua — replace window_has_nontext with body_has_nontext, which scans the raw body and bypasses L2 whenever any block is non-text. It also guards non-table message items and content shaped as a single block object.

Why it looked flaky

t/plugin/ai-cache-semantic.t TEST 64 asserts a multimodal prompt is a MISS. Whether the collision surfaces depends on TEST 63's async log-phase L2 write-back racing TEST 64's read (--- wait: 0.5), so it reproduces intermittently. It reproduces deterministically against a local RediSearch.

Tests

  • t/plugin/ai-cache-semantic.t — TEST 64 (end-to-end MISS) now passes; added TEST 66 unit-testing the malformed-input handling. Full file green (215 tests), luacheck clean.

Checklist:

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change (N/A: internal bugfix, no behavior/schema change)
  • I have verified that this change is backward compatible

get_messages() flattens structured message content (images, tool calls)
to plain text, which silently broke ai-cache fidelity at both layers:

- L1 exact key: key.fingerprint hashes the flattened messages, so a
  text+image prompt collides with a text-only one carrying the same text
  and is wrongly served the text-only response.
- L2 semantic bypass: window_has_nontext only tripped on table content,
  which no longer exists after flattening, so multimodal prompts stopped
  bypassing the semantic layer and could hit a same-text vector.

Fold the raw messages into the L1 fingerprint, and scan the raw body for
non-text blocks to drive the L2 bypass. body_has_nontext also guards
non-table message items and content shaped as a single block object.

t/plugin/ai-cache-semantic.t TEST 64 covers the end-to-end MISS; TEST 66
unit-tests the malformed-input handling.
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Jul 3, 2026
membphis
membphis previously approved these changes Jul 3, 2026

@membphis membphis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the patch and did not find merge-blocking issues.

get_messages() also drops assistant tool_calls (and legacy function_call),
so two prompts with identical text but different tool calls could collide
on one L2 cell and return a wrong HIT. Treat non-empty tool_calls /
function_call as non-text in body_has_nontext, same as media blocks.
Extend TEST 66 to cover them.
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jul 3, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes ai-cache correctness for multimodal and tool-call prompts after ai-protocols began flattening structured messages content to plain text, which could cause L1 key collisions and allow unsafe L2 semantic hits across distinct prompts.

Changes:

  • Extend the L1 exact-cache fingerprinting to incorporate raw body.messages, preventing text-only vs text+media collisions.
  • Replace the L2 “non-text” gate with body_has_nontext() that scans the raw request body for non-text blocks and tool/function call state before allowing semantic lookup.
  • Add a focused unit-style test to validate body_has_nontext() behavior, including malformed input tolerance.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
apisix/plugins/ai-cache/key.lua Incorporates raw messages into the L1 fingerprint to preserve multimodal/tool-call fidelity.
apisix/plugins/ai-cache/semantic.lua Adds body_has_nontext() and uses it to bypass L2 when prompts include non-text/tool-call state.
t/plugin/ai-cache-semantic.t Adds coverage for body_has_nontext() detection and malformed input handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 105 to 110
local repr = build_repr(ctx, body, client_messages(ctx, body))
-- client_messages() (get_messages) flattens structured content to plain text,
-- so an exact fingerprint of it alone would let a text+image prompt collide
-- with a text-only one. Fold the raw messages in to keep them distinct.
repr.raw_messages = body.messages
return hex_digest(core.json.canonical_encode(repr))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be handled in a later PR, we need to merge this PR soon to fix the failing CI.

membphis
membphis approved these changes Jul 3, 2026
@shreemaan-abhishek shreemaan-abhishek merged commit b83f323 into apache:master Jul 3, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants