Skip to content

feat(ai-aliyun-content-moderation): moderate system and tool role content#13646

Open
AlinsRan wants to merge 2 commits into
apache:masterfrom
AlinsRan:feat/aliyun-moderation-system-tool-roles
Open

feat(ai-aliyun-content-moderation): moderate system and tool role content#13646
AlinsRan wants to merge 2 commits into
apache:masterfrom
AlinsRan:feat/aliyun-moderation-system-tool-roles

Conversation

@AlinsRan

@AlinsRan AlinsRan commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Description

Extends the request-side moderation of ai-aliyun-content-moderation beyond the user role, to cover the Agent + tool-calling (MCP) threat model where tool results and a poisoned system prompt can carry indirect prompt injection into the next LLM turn.

New request_check_roles option (array, default ["user"], fully backward compatible):

  • user / tool follow the existing request_check_mode. last now walks the trailing consecutive block of selected-role messages, so a fresh user turn or the current round's tool results are moderated without re-checking conversation history.
  • system ignores request_check_mode and is moderated on every request (all system messages), because it can be poisoned by malicious ToolCall arguments overwriting the system prompt.

Protocol layer gains extract_turn_content(body, mode, roles) and extract_system_content(body) across openai-chat / anthropic-messages / openai-responses / bedrock-converse / openai-embeddings. When a configured role has no extractor on the current protocol, the request is routed through binding.on_unsupported(...) so fail_mode decides, instead of silently passing unmoderated.

With the default ["user"], extract_turn_content(body, mode, {user=true}) is equivalent to the previous extract_user_content(body, mode), so existing behavior is unchanged.

Note: tool-result moderation applies to OpenAI-compatible formats where the tool output is a distinct tool role/item; Anthropic/Bedrock nest tool results inside user messages and are not extracted (documented in the option table).

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change (unit tests for the extractors + end-to-end role moderation, t/plugin/ai-aliyun-content-moderation.t TEST 57–72)
  • I have updated the documentation (en/zh ai-aliyun-content-moderation.md)
  • I have verified backward compatibility (default ["user"] preserves prior behavior)

…tent

Extend request-side moderation beyond the user role via a new
request_check_roles option (array, default ["user"], backward compatible):

- user/tool follow request_check_mode; "last" walks the trailing block of
  selected-role messages, so a fresh user turn or the current round's tool
  results are moderated without re-checking history.
- system ignores request_check_mode and is moderated on every request (all
  system messages), because it can be poisoned by malicious ToolCall
  arguments overwriting the system prompt.

Protocol layer gains extract_turn_content(body, mode, roles) and
extract_system_content(body) across openai-chat/anthropic-messages/
openai-responses/bedrock-converse/openai-embeddings. A configured role with
no extractor on the current protocol is routed through binding.on_unsupported
so fail_mode decides, instead of silently passing unmoderated.

Note: tool-result moderation applies to OpenAI-compatible formats where the
tool output is a distinct tool role; Anthropic/Bedrock nest tool results in
user messages and are not extracted (documented).
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jul 2, 2026
…ation-system-tool-roles

# Conflicts:
#	apisix/plugins/ai-protocols/bedrock-converse.lua
#	apisix/plugins/ai-protocols/openai-responses.lua
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants