Skip to content

FEAT: Backfill class-level metadata for all remote seed datasets#1780

Open
romanlutz wants to merge 4 commits into
microsoft:mainfrom
romanlutz:romanlutz/backfill-dataset-metadata
Open

FEAT: Backfill class-level metadata for all remote seed datasets#1780
romanlutz wants to merge 4 commits into
microsoft:mainfrom
romanlutz:romanlutz/backfill-dataset-metadata

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

What

Adds class-level tags, size, and modalities to every remote seed dataset loader so they participate in SeedDatasetFilter discovery (e.g. SeedDatasetFilter(tags={"default"})). Before this change, only 5 of ~33 remote loaders declared metadata, so the others were silently skipped by metadata-driven filtering.

This is the follow-up to the review discussion on #1757 (cc @jsong468) where reviewer asked why some loaders declared these fields and others didn't. Answer: most predate the metadata schema and just hadn't been backfilled.

How

  1. Pinned a canonical, advisory tag vocabulary as RECOMMENDED_TAGS in pyrit/datasets/seed_datasets/seed_metadata.py. Users can still set custom tags — the metadata parser does not enforce, but a new parametrized coverage test does.
  2. Documented the 5-condition rule for the special default tag inline in seed_metadata.py:
    1. Ungated — no HF token, API key, auth, or signup.
    2. Citable — peer-reviewed paper / established benchmark.
    3. Single-callawait loader.fetch_dataset_async() works with no manual setup.
    4. Size >= medium (>=100 prompts).
    5. Broadly applicable — not narrowly scoped to a vertical (medical, legal, cybersecurity). Cross-cutting axes like privacy, bias, multimodal, multilingual, refusal, and jailbreak DO count.
  3. Walked every remote loader and assigned size / tags / modalities based on the loader's docstring, tests, and upstream dataset card. Added inline # N prompts comments next to each size so reviewers can verify the bucket choice locally.
  4. Renamed the non-canonical multilingual_culture tag on _SGXSTestDataset to multilingual and dropped its default tag (SGXSTest is gated on HF).
  5. Marked _ORBenchBaseDataset as should_register = False (it has no usable dataset_name) and explicitly opted the three OR-Bench leaf classes back in.
  6. Added TestRemoteLoaderMetadataCoverage — a parametrized test that walks every concrete _RemoteDatasetLoader subclass via auto-registration and asserts: metadata is present, tags/size/modalities are non-empty, size is in SeedDatasetSizeCategory, and tags is a subset of RECOMMENDED_TAGS (catches future typos like multilingual_culture).

Class-level harm_categories is intentionally deferred — per-row SeedPrompt.harm_categories already labels individual prompts; picking a "broadest" class-level summary is a judgment call better made by domain owners in a focused follow-up.

Backfill table

Loader size modalities tags default?
_AegisContentSafetyDataset huge (text,) {default, safety} YES
_AyaRedteamingDataset medium (text,) {safety, multilingual} no — per-language ~few hundred
_BabelscapeAlertDataset huge (text,) {default, safety, jailbreak} YES
_BeaverTailsDataset huge (text,) {default, safety} YES
_CBTBenchDataset medium (text,) {safety, medical} no — vertical
_CCPSensitivePromptsDataset small (text,) {safety, multilingual} no
_DarkBenchDataset medium (text,) {default, safety} YES
_EquityMedQADataset medium (text,) {safety, bias, medical} no — vertical
_ForbiddenQuestionsDataset medium (text,) {default, safety, jailbreak} YES
_HarmBenchMultimodalDataset small (text, image) {safety, jailbreak, multimodal} no
_HarmfulQADataset large (text,) {default, safety, jailbreak} YES
_JBBBehaviorsDataset small (text,) {safety, jailbreak} no
_LibrAIDoNotAnswerDataset medium (text,) {default, safety, refusal} YES
_LLMLatentAdversarialTrainingDataset large (text,) {default, safety, jailbreak} YES
_MedSafetyBenchDataset large (text,) {safety, medical} no — vertical
_MLCommonsAILuminateDataset large (text,) {default, safety} YES
_MultilingualVulnerabilityDataset medium (text,) {default, safety, multilingual} YES
_ORBench80KDataset huge (text,) {default, safety, refusal} YES
_ORBenchHardDataset large (text,) {default, safety, refusal} YES
_ORBenchToxicDataset large (text,) {default, safety, refusal} YES
_PKUSafeRLHFDataset huge (text,) {default, safety} YES
_PromptIntelDataset medium (text,) {safety, jailbreak, cybersecurity} no — API key
_RedTeamSocialBiasDataset small (text,) {safety, bias, multiturn} no
_SaladBenchDataset huge (text,) {default, safety, jailbreak} YES
_SimpleSafetyTestsDataset small (text,) {safety} no
_SorryBenchDataset large (text,) {safety, jailbreak, synthetic} no — gated
_SOSBenchDataset large (text,) {safety, medical, cybersecurity} no — vertical
_TDC23RedteamingDataset small (text,) {safety, jailbreak} no
_ToxicChatDataset huge (text,) {default, safety, multiturn} YES
_TransphobiaAwarenessDataset medium (text,) {default, safety, bias} YES
_VLGuardDataset large (text, image) {safety, multimodal} no — gated
_VLSUMultimodalDataset large (text, image) {default, safety, multimodal} YES
_XSTestDataset medium (text,) {default, safety, refusal} YES
_SGXSTestDataset (fixed) medium (text,) {safety, multilingual} no — gated, was {default, safety, multilingual_culture}

Already-tagged (unchanged): _HarmBenchDataset, _ComicJailbreakDataset, _VisualLeakBenchDataset.

No-op

  • No public API changes
  • No runtime behavior changes
  • No changes to per-row SeedPrompt.harm_categories
  • The 4 already-tagged loaders are not migrated to immutable frozenset / tuple style (cosmetic — out of scope)

Discussion link

#1757

Adds class-level `tags`, `size`, and `modalities` to all remote seed

dataset loaders so they participate in `SeedDatasetFilter` discovery.

Pins a recommended tag vocabulary and the 5-condition rule for the special

`default` tag in `seed_metadata.py` as a soft contract, and enforces it

via a new parametrized coverage test in `test_seed_dataset_provider.py`.

Also renames `_SGXSTestDataset`'s non-canonical `multilingual_culture` tag

to `multilingual` and drops `default` (the dataset is gated), and gates

`_ORBenchBaseDataset` from auto-registration since it is not a usable

loader on its own.

No runtime behavior or public API changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@behnam-o behnam-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of minor comments, but also looks good as is.

Comment thread pyrit/datasets/seed_datasets/remote/harmbench_multimodal_dataset.py Outdated
Comment thread pyrit/datasets/seed_datasets/remote/forbidden_questions_dataset.py Outdated
romanlutz and others added 3 commits May 22, 2026 13:20
…uckets

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…taset-metadata

# Conflicts:
#	pyrit/datasets/seed_datasets/remote/comic_jailbreak_dataset.py
#	pyrit/datasets/seed_datasets/remote/harmbench_multimodal_dataset.py
#	pyrit/datasets/seed_datasets/remote/visual_leak_bench_dataset.py
#	pyrit/datasets/seed_datasets/remote/vlsu_multimodal_dataset.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants