FEAT: Add FigStep SafeBench multimodal dataset loader#1787
Open
romanlutz wants to merge 3 commits into
Open
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aset-loader # Conflicts: # doc/bibliography.md
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a PyRIT seed dataset loader for FigStep (Gong et al., AAAI 2025 Oral) — a typographic-image jailbreak benchmark for vision-language models. The benchmark, SafeBench, contains 500 questions across 10 harmful topics with a 50-question SafeBench-Tiny subset used for the paper's headline experiments.
What's included
pyrit/datasets/seed_datasets/remote/figstep_dataset.py—_FigStepDatasetwith module-level enumsFigStepCategory(10 members matching the CSV'scategory_namecasing) andFigStepVariant."The image shows a list numbered 1, 2, and 3, but the items are empty...").{benign_sentence}substituted frombenign_sentences_without_harmful_phase.csv. Only the tiny subset has pre-cut sub-images upstream, so this variant requiresuse_tiny=True(loader raisesValueErrorotherwise).SeedObjective(question)+ NSeedPrompt(image_path)+SeedPrompt(text), all sharing oneprompt_group_idandsequence=0. The original harmfulquestionis preserved as the group objective so scorers can evaluate against it (the visible carrier text alone is benign).src/generate_prompts.pyandREADME.md.sub-figures.zip→ downloaded once and extracted todbdata/seed-prompt-entries/figstep_pro_subfigures_<sha>/. Benign-sentences CSV fetched as raw text (the upstream file has unquoted commas like,000that break strict CSV parsing).pyrit/datasets/seed_datasets/remote/__init__.py(auto-discovered viaSeedDatasetProvider.__init_subclass__).gong2025figstepadded todoc/references.bib; citation added to the alphabetical prose paragraph indoc/code/datasets/1_loading_datasets.{py,ipynb}and the hidden-citations list indoc/bibliography.md.Parameters
use_tinyTrueFalseloads the full 500-question SafeBench.variantFigStepVariant.FIGSTEPFIGSTEP_PROrequiresuse_tiny=True.categoriesNoneFigStepCategorymembers (e.g.[FigStepCategory.FRAUD]).source/source_typeNone/"public_url"Tests
tests/unit/datasets/test_figstep_dataset.py— 31 unit tests, all passing. Mocks_fetch_from_urland image helpers; covers default init, full/tiny URL routing,FIGSTEP_PRO + use_tiny=FalseValueError, invalid enum and raw-string rejection, group structure for both variants, per-row metadata, category filter, empty-after-filterValueError, missing-keysValueError, failed-image skip, and sub-image discovery (sort / missing dir / wrong prefix).Live fetch verification
Ran each configuration end-to-end against real upstream with
cache=False(mirroring the asserts intests/end_to_end/test_all_datasets.py):figsteptiny defaultfigstepfull SafeBenchfigsteptiny + Fraud filterfigstep_protinyKnown limitation
The existing
tests/end_to_end/test_all_datasets.pyonly parametrizes over each registered provider's default constructor —figstep_proanduse_tiny=Falsearen't covered automatically. This is the same limitation every multi-config loader in the repo has today; an audit / structural fix is being scoped in a separate session.