Skip to content

FEAT: Round Robin Target#1761

Open
jsong468 wants to merge 1 commit into
microsoft:mainfrom
jsong468:round_robin
Open

FEAT: Round Robin Target#1761
jsong468 wants to merge 1 commit into
microsoft:mainfrom
jsong468:round_robin

Conversation

@jsong468
Copy link
Copy Markdown
Contributor

Round Robin Target

Description and design decisions

  • New RoundRobinTarget class (pyrit/prompt_target/round_robin_target.py): a PromptTarget that wraps multiple inner targets and distributes requests across them using weighted round-robin selection. Intended for load-balancing across multiple deployments of the same model (e.g., Azure OpenAI endpoints in different regions).

  • Per-call distribution, not per-conversation: requests are distributed on every call to _send_prompt_to_target_async, not pinned to a conversation. This is safe because PyRIT's conversation history is managed at the conversation_id level in shared memory — not by the target itself. When any inner target handles a request, the base class _get_normalized_conversation_async fetches the full conversation from memory by conversation_id, appends the current message, and passes the complete history to the inner target. The inner target never needs to "remember" prior turns; it receives them in full every time. This architecture means switching inner targets mid-conversation has no effect on correctness.

  • Requires multi-turn + editable history: all inner targets must support supports_multi_turn and supports_editable_history. This is enforced at construction using the existing CHAT_TARGET_REQUIREMENTS validation infrastructure. These capabilities guarantee the target rebuilds its state from the provided conversation rather than relying on server-side state.

  • Same concrete class required: all inner targets must be the same Python class (e.g., all OpenAIChatTarget). This prevents mixing fundamentally different target types that happen to share the same interface.

  • Behavioral parameter consistency: inner targets must have matching underlying_model_name (with model_name fallback), temperature, and top_p. This ensures scoring results are comparable across targets. The validation uses the same (newly introduced) constants (TARGET_BEHAVIORAL_PARAMS, TARGET_BEHAVIORAL_PARAM_FALLBACKS) as the eval hash computation, so they cannot drift.

  • Capability intersection: the round-robin's capabilities are the intersection (lower bound) of all inner targets' capabilities. Boolean capability flags are AND-ed; modality frozensets are intersected. If the intersection of input or output modalities is empty, construction fails.

  • Optional integer weights: weights=[2, 1] expands into a rotation list [0, 0, 1] that cycles, sending roughly 2x traffic to the first target. Default is equal weight.

  • Memory entries use the round-robin's identifier: the prompt_target_identifier on request and response pieces is the RoundRobinTarget's own ComponentIdentifier. This keeps memory entries consistent — a single conversation shows one identifier throughout. The hash of the inner target that actually handled each request is recorded in prompt_metadata["inner_target_identifier"] for traceability.

  • Eval hash unwrap mechanism (pyrit/identifiers/evaluation_identifier.py): added unwrap_child field to ChildEvalRule. When set, the eval hash computation "sees through" wrapper targets by substituting the first inner child before applying param filtering. This ensures scorer(round_robin([t1_east, t1_west])) produces the same eval hash as scorer(t1_east), making scoring results comparable regardless of whether a round-robin was used. Applied to ScorerEvaluationIdentifier (prompt_target child) and AtomicAttackEvaluationIdentifier (objective_target child).

  • Why round-robin identifier on memory entries but unwrap in eval hash: these serve different purposes and operate at different layers. The prompt_target_identifier on memory entries answers "what component was responsible for this request?" which is the RoundRobinTarget, since that's what the caller passed to the normalizer or scorer. Stamping inner target identifiers would create inconsistency within a single conversation (different turns showing different identifiers) and would require overriding _get_normalized_conversation_async to mutate message pieces, adding complexity for no functional gain. The inner target that actually handled each request is still traceable via prompt_metadata["inner_target_identifier"]. The eval hash, by contrast, answers a completely different question: "are these two scorer configurations behaviorally equivalent for grouping evaluation results?" For that purpose, what matters isn't the wrapper but rather the underlying model, temperature, and top_p. The unwrap mechanism lives entirely in the eval hash computation layer and doesn't touch memory entries, identifiers, or runtime behavior. Keeping these two concerns separate means the memory layer stays simple (no hook overrides, no mutation) while the eval layer correctly groups results regardless of whether a round-robin was used.

  • Prompt caching trade-off: switching targets mid-conversation defeats provider-side prompt prefix caching (e.g., OpenAI cached tokens). This is a cost/latency trade-off, not a correctness issue, and is documented in the class docstring.

  • Concurrency safety: the only shared mutable state is self._counter (the rotation index), which is only mutated in the synchronous _next_target() method. Under Python's asyncio cooperative concurrency model, this is safe — no two coroutines can interleave within a synchronous method. Crucially, because the target is selected synchronously (as a local variable) before the await call to _send_prompt_to_target_async, even if another coroutine advances _counter while the first is waiting on the network call, the already-selected target reference cannot be affected. Not safe for multi-threaded use, consistent with the rest of PyRIT's target classes.

  • Minimal override surface: only _send_prompt_to_target_async and _build_identifier are overridden. No override of _get_normalized_conversation_async or set_system_prompt — the base class handles both correctly since all memory operations are keyed by conversation_id and stamped with self.get_identifier().

Tests and Documentation

  • Unit tests (tests/unit/prompt_target/test_round_robin_target.py): 24 tests covering:

    • Construction validation: rejects < 2 targets, mixed classes, mismatched weights, zero/negative weights
    • Capability intersection: boolean AND, modality intersection, empty modality rejection
    • Capability requirements: rejects targets without multi-turn, rejects targets without editable history
    • Round-robin selection: FIFO rotation, weighted rotation
    • Delegation: _send_prompt_to_target_async delegates to correct inner target, records inner_target_identifier in metadata, round-robins across calls
    • set_system_prompt: uses round-robin identifier (verified via memory lookup)
    • Identifier: includes children and weights
    • End-to-end: full send_prompt_async flow keeps round-robin identifier on entries
    • Behavioral validation: rejects mismatched underlying_model_name, rejects mismatched temperature, accepts matching params with different endpoints, uses model_name fallback
  • Eval hash unwrap tests (tests/unit/identifiers/test_evaluation_identifier.py): 3 tests added:

    • test_unwrap_substitutes_first_inner_child: verifies the unwrap produces the same hash as the direct target
    • test_unwrap_no_op_when_child_has_no_matching_subchild: verifies non-wrapper targets are unaffected
    • test_scorer_eval_hash_matches_with_and_without_round_robin: end-to-end ScorerEvaluationIdentifier equivalence
  • Documentation notebook (doc/code/targets/round_robin_target.ipynb and round_robin_target.py): 5 sections demonstrating:

    • Basic usage with alternation printing showing which target handled each request
    • Weighted distribution with count summary
    • Drop-in usage with PromptSendingAttack
    • Multi-turn attack (Crescendo) with round-robin objective target
    • Batch scoring with round-robin scorer target, printing which scorer target scored each prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant