feat(ai-aliyun-content-moderation): moderate system and tool role content#13646
Open
AlinsRan wants to merge 2 commits into
Open
feat(ai-aliyun-content-moderation): moderate system and tool role content#13646AlinsRan wants to merge 2 commits into
AlinsRan wants to merge 2 commits into
Conversation
…tent Extend request-side moderation beyond the user role via a new request_check_roles option (array, default ["user"], backward compatible): - user/tool follow request_check_mode; "last" walks the trailing block of selected-role messages, so a fresh user turn or the current round's tool results are moderated without re-checking history. - system ignores request_check_mode and is moderated on every request (all system messages), because it can be poisoned by malicious ToolCall arguments overwriting the system prompt. Protocol layer gains extract_turn_content(body, mode, roles) and extract_system_content(body) across openai-chat/anthropic-messages/ openai-responses/bedrock-converse/openai-embeddings. A configured role with no extractor on the current protocol is routed through binding.on_unsupported so fail_mode decides, instead of silently passing unmoderated. Note: tool-result moderation applies to OpenAI-compatible formats where the tool output is a distinct tool role; Anthropic/Bedrock nest tool results in user messages and are not extracted (documented).
…ation-system-tool-roles # Conflicts: # apisix/plugins/ai-protocols/bedrock-converse.lua # apisix/plugins/ai-protocols/openai-responses.lua
shreemaan-abhishek
approved these changes
Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Extends the request-side moderation of
ai-aliyun-content-moderationbeyond theuserrole, to cover the Agent + tool-calling (MCP) threat model where tool results and a poisoned system prompt can carry indirect prompt injection into the next LLM turn.New
request_check_rolesoption (array, default["user"], fully backward compatible):request_check_mode.lastnow walks the trailing consecutive block of selected-role messages, so a fresh user turn or the current round's tool results are moderated without re-checking conversation history.request_check_modeand is moderated on every request (all system messages), because it can be poisoned by malicious ToolCall arguments overwriting the system prompt.Protocol layer gains
extract_turn_content(body, mode, roles)andextract_system_content(body)acrossopenai-chat/anthropic-messages/openai-responses/bedrock-converse/openai-embeddings. When a configured role has no extractor on the current protocol, the request is routed throughbinding.on_unsupported(...)sofail_modedecides, instead of silently passing unmoderated.With the default
["user"],extract_turn_content(body, mode, {user=true})is equivalent to the previousextract_user_content(body, mode), so existing behavior is unchanged.Note: tool-result moderation applies to OpenAI-compatible formats where the tool output is a distinct
toolrole/item; Anthropic/Bedrock nest tool results inside user messages and are not extracted (documented in the option table).Checklist
t/plugin/ai-aliyun-content-moderation.tTEST 57–72)ai-aliyun-content-moderation.md)["user"]preserves prior behavior)