You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
add a committed enhanced_cps_manifest_2025.json with row counts, checksums, git blob SHAs, exchange-rate/build assumptions, loss diagnostics, and weight diagnostics
add fast contract tests that compare the manifest, README, source CSV, H5 row counts, checksums, and weight diagnostics
document public transfer methodology/limitations and clarify that policybench_transfer_* files are legacy 1k artifacts while Python aliases point to the current enhanced_cps builder
add a separate manual workflow and Make target for uploading only the public transfer artifacts to the public Hugging Face repo
Why
The public transfer artifact is now part of the PolicyBench path, but its provenance should not depend on the private eFRS deployment workflow. This keeps public artifact publication separate from private UKDS-derived artifacts and makes misuse harder.
✅ Safe to merge from a data-protection standpoint. This PR creates a parallel public upload path that touches none of the FRS-derived plumbing CLAUDE.md protects. Code quality is solid; tests are unusually thorough.
A few things worth a look before merge:
Token scope. The workflow uses secrets.HUGGING_FACE_TOKEN. If that token has write access to -private, a future swap of PUBLIC_REPO ↔ PRIVATE_REPO could leak. Two follow-ups worth considering: un-xfailtest_every_hf_upload_routes_through_guard_constants once remaining literal call sites are migrated, and/or provision a dedicated public-only token.
POLICYBENCH_TRANSFER_SOURCE_FILE = ENHANCED_CPS_SOURCE_FILE is a footgun. Anyone importing the legacy alias silently gets the 28k CSV rather than the 1k file. Documented in the docstring, but either a louder warning or a runtime DeprecationWarning on import would surface this to consumers earlier.
Legacy policybench_transfer_2025.h5 provenance. Was the 1,000-row legacy artifact ever FRS-derived in an earlier commit? Worth a one-liner confirmation from whoever produced it. Not a regression here.
Repo size.enhanced_cps_2025.h5 is committed to git. If it's >50MB, git lfs would be friendlier; if it's small, ignore.
Minor:_pick_region uses household_id * 2654435761 — a # Knuth multiplicative hash comment would help future readers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
enhanced_cps_manifest_2025.jsonwith row counts, checksums, git blob SHAs, exchange-rate/build assumptions, loss diagnostics, and weight diagnosticspolicybench_transfer_*files are legacy 1k artifacts while Python aliases point to the currentenhanced_cpsbuilderWhy
The public transfer artifact is now part of the PolicyBench path, but its provenance should not depend on the private eFRS deployment workflow. This keeps public artifact publication separate from private UKDS-derived artifacts and makes misuse harder.
Validation
UV_FROZEN=0 uv run --python 3.13 ruff check policyengine_uk_data/datasets/enhanced_cps.py policyengine_uk_data/datasets/__init__.py policyengine_uk_data/datasets/policybench_transfer.py policyengine_uk_data/utils/enhanced_cps_manifest.py policyengine_uk_data/storage/write_enhanced_cps_manifest.py policyengine_uk_data/storage/upload_public_transfer_dataset.py policyengine_uk_data/tests/test_enhanced_cps_artifact_manifest.pygit diff --checkUV_FROZEN=0 uv run --python 3.13 pytest policyengine_uk_data/tests/test_enhanced_cps_artifact_manifest.py policyengine_uk_data/tests/test_policybench_transfer.py policyengine_uk_data/tests/test_release_manifest.py policyengine_uk_data/tests/test_hf_destinations.py