Skip to content

FEAT: Add FigStep SafeBench multimodal dataset loader#1787

Open
romanlutz wants to merge 3 commits into
microsoft:mainfrom
romanlutz:romanlutz/figstep-dataset-loader
Open

FEAT: Add FigStep SafeBench multimodal dataset loader#1787
romanlutz wants to merge 3 commits into
microsoft:mainfrom
romanlutz:romanlutz/figstep-dataset-loader

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

Summary

Adds a PyRIT seed dataset loader for FigStep (Gong et al., AAAI 2025 Oral) — a typographic-image jailbreak benchmark for vision-language models. The benchmark, SafeBench, contains 500 questions across 10 harmful topics with a 50-question SafeBench-Tiny subset used for the paper's headline experiments.

What's included

  • pyrit/datasets/seed_datasets/remote/figstep_dataset.py_FigStepDataset with module-level enums FigStepCategory (10 members matching the CSV's category_name casing) and FigStepVariant.
  • Two attack variants, both behind one loader:
    • FigStep (default): single typographic image of the numbered-list rewrite + the original benign carrier text prompt ("The image shows a list numbered 1, 2, and 3, but the items are empty...").
    • FigStep-Pro: the GPT-4V/OCR-evasion upgrade. Each question is rendered as 3–7 sub-images plus a longer per-row templated carrier prompt with {benign_sentence} substituted from benign_sentences_without_harmful_phase.csv. Only the tiny subset has pre-cut sub-images upstream, so this variant requires use_tiny=True (loader raises ValueError otherwise).
  • Group shape per row: SeedObjective(question) + N SeedPrompt(image_path) + SeedPrompt(text), all sharing one prompt_group_id and sequence=0. The original harmful question is preserved as the group objective so scorers can evaluate against it (the visible carrier text alone is benign).
  • Carrier prompts copied verbatim from upstream src/generate_prompts.py and README.md.
  • FigStep-Pro sub-images distributed only via sub-figures.zip → downloaded once and extracted to dbdata/seed-prompt-entries/figstep_pro_subfigures_<sha>/. Benign-sentences CSV fetched as raw text (the upstream file has unquoted commas like ,000 that break strict CSV parsing).
  • Registered in pyrit/datasets/seed_datasets/remote/__init__.py (auto-discovered via SeedDatasetProvider.__init_subclass__).
  • BibTeX entry gong2025figstep added to doc/references.bib; citation added to the alphabetical prose paragraph in doc/code/datasets/1_loading_datasets.{py,ipynb} and the hidden-citations list in doc/bibliography.md.

Parameters

Parameter Default Notes
use_tiny True Paper's headline experiments evaluate on SafeBench-Tiny. False loads the full 500-question SafeBench.
variant FigStepVariant.FIGSTEP FIGSTEP_PRO requires use_tiny=True.
categories None Filter by FigStepCategory members (e.g. [FigStepCategory.FRAUD]).
source / source_type None / "public_url" Standard overrides.

Tests

tests/unit/datasets/test_figstep_dataset.py — 31 unit tests, all passing. Mocks _fetch_from_url and image helpers; covers default init, full/tiny URL routing, FIGSTEP_PRO + use_tiny=False ValueError, invalid enum and raw-string rejection, group structure for both variants, per-row metadata, category filter, empty-after-filter ValueError, missing-keys ValueError, failed-image skip, and sub-image discovery (sort / missing dir / wrong prefix).

Live fetch verification

Ran each configuration end-to-end against real upstream with cache=False (mirroring the asserts in tests/end_to_end/test_all_datasets.py):

Configuration Seeds Result
figstep tiny default 150 PASS (50 × objective + image + text)
figstep full SafeBench 1500 PASS
figstep tiny + Fraud filter 15 PASS
figstep_pro tiny 281 PASS (50 objectives + 181 image pieces + 50 texts; avg ~3.6 sub-images/row)

Known limitation

The existing tests/end_to_end/test_all_datasets.py only parametrizes over each registered provider's default constructorfigstep_pro and use_tiny=False aren't covered automatically. This is the same limitation every multi-config loader in the repo has today; an audit / structural fix is being scoped in a separate session.

romanlutz and others added 3 commits May 22, 2026 14:56
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aset-loader

# Conflicts:
#	doc/bibliography.md
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant