FEAT: Round Robin Target#1761
Open
jsong468 wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Round Robin Target
Description and design decisions
New
RoundRobinTargetclass (pyrit/prompt_target/round_robin_target.py): aPromptTargetthat wraps multiple inner targets and distributes requests across them using weighted round-robin selection. Intended for load-balancing across multiple deployments of the same model (e.g., Azure OpenAI endpoints in different regions).Per-call distribution, not per-conversation: requests are distributed on every call to
_send_prompt_to_target_async, not pinned to a conversation. This is safe because PyRIT's conversation history is managed at theconversation_idlevel in shared memory — not by the target itself. When any inner target handles a request, the base class_get_normalized_conversation_asyncfetches the full conversation from memory byconversation_id, appends the current message, and passes the complete history to the inner target. The inner target never needs to "remember" prior turns; it receives them in full every time. This architecture means switching inner targets mid-conversation has no effect on correctness.Requires multi-turn + editable history: all inner targets must support
supports_multi_turnandsupports_editable_history. This is enforced at construction using the existingCHAT_TARGET_REQUIREMENTSvalidation infrastructure. These capabilities guarantee the target rebuilds its state from the provided conversation rather than relying on server-side state.Same concrete class required: all inner targets must be the same Python class (e.g., all
OpenAIChatTarget). This prevents mixing fundamentally different target types that happen to share the same interface.Behavioral parameter consistency: inner targets must have matching
underlying_model_name(withmodel_namefallback),temperature, andtop_p. This ensures scoring results are comparable across targets. The validation uses the same (newly introduced) constants (TARGET_BEHAVIORAL_PARAMS,TARGET_BEHAVIORAL_PARAM_FALLBACKS) as the eval hash computation, so they cannot drift.Capability intersection: the round-robin's capabilities are the intersection (lower bound) of all inner targets' capabilities. Boolean capability flags are AND-ed; modality frozensets are intersected. If the intersection of input or output modalities is empty, construction fails.
Optional integer weights:
weights=[2, 1]expands into a rotation list[0, 0, 1]that cycles, sending roughly 2x traffic to the first target. Default is equal weight.Memory entries use the round-robin's identifier: the
prompt_target_identifieron request and response pieces is theRoundRobinTarget's ownComponentIdentifier. This keeps memory entries consistent — a single conversation shows one identifier throughout. The hash of the inner target that actually handled each request is recorded inprompt_metadata["inner_target_identifier"]for traceability.Eval hash unwrap mechanism (
pyrit/identifiers/evaluation_identifier.py): addedunwrap_childfield toChildEvalRule. When set, the eval hash computation "sees through" wrapper targets by substituting the first inner child before applying param filtering. This ensuresscorer(round_robin([t1_east, t1_west]))produces the same eval hash asscorer(t1_east), making scoring results comparable regardless of whether a round-robin was used. Applied toScorerEvaluationIdentifier(prompt_targetchild) andAtomicAttackEvaluationIdentifier(objective_targetchild).Why round-robin identifier on memory entries but unwrap in eval hash: these serve different purposes and operate at different layers. The
prompt_target_identifieron memory entries answers "what component was responsible for this request?" which is theRoundRobinTarget, since that's what the caller passed to the normalizer or scorer. Stamping inner target identifiers would create inconsistency within a single conversation (different turns showing different identifiers) and would require overriding_get_normalized_conversation_asyncto mutate message pieces, adding complexity for no functional gain. The inner target that actually handled each request is still traceable viaprompt_metadata["inner_target_identifier"]. The eval hash, by contrast, answers a completely different question: "are these two scorer configurations behaviorally equivalent for grouping evaluation results?" For that purpose, what matters isn't the wrapper but rather the underlying model, temperature, and top_p. The unwrap mechanism lives entirely in the eval hash computation layer and doesn't touch memory entries, identifiers, or runtime behavior. Keeping these two concerns separate means the memory layer stays simple (no hook overrides, no mutation) while the eval layer correctly groups results regardless of whether a round-robin was used.Prompt caching trade-off: switching targets mid-conversation defeats provider-side prompt prefix caching (e.g., OpenAI cached tokens). This is a cost/latency trade-off, not a correctness issue, and is documented in the class docstring.
Concurrency safety: the only shared mutable state is
self._counter(the rotation index), which is only mutated in the synchronous_next_target()method. Under Python's asyncio cooperative concurrency model, this is safe — no two coroutines can interleave within a synchronous method. Crucially, because the target is selected synchronously (as a local variable) before theawaitcall to_send_prompt_to_target_async, even if another coroutine advances_counterwhile the first is waiting on the network call, the already-selected target reference cannot be affected. Not safe for multi-threaded use, consistent with the rest of PyRIT's target classes.Minimal override surface: only
_send_prompt_to_target_asyncand_build_identifierare overridden. No override of_get_normalized_conversation_asyncorset_system_prompt— the base class handles both correctly since all memory operations are keyed byconversation_idand stamped withself.get_identifier().Tests and Documentation
Unit tests (
tests/unit/prompt_target/test_round_robin_target.py): 24 tests covering:_send_prompt_to_target_asyncdelegates to correct inner target, recordsinner_target_identifierin metadata, round-robins across callsset_system_prompt: uses round-robin identifier (verified via memory lookup)send_prompt_asyncflow keeps round-robin identifier on entriesunderlying_model_name, rejects mismatchedtemperature, accepts matching params with different endpoints, usesmodel_namefallbackEval hash unwrap tests (
tests/unit/identifiers/test_evaluation_identifier.py): 3 tests added:test_unwrap_substitutes_first_inner_child: verifies the unwrap produces the same hash as the direct targettest_unwrap_no_op_when_child_has_no_matching_subchild: verifies non-wrapper targets are unaffectedtest_scorer_eval_hash_matches_with_and_without_round_robin: end-to-endScorerEvaluationIdentifierequivalenceDocumentation notebook (
doc/code/targets/round_robin_target.ipynbandround_robin_target.py): 5 sections demonstrating:PromptSendingAttack