Restructuring: top-level layout + workflow/papers/scratch division by cailmdaley · Pull Request #197 · CosmoStat/sp_validation

cailmdaley · 2026-06-05T20:31:19Z

Restructuring `sp_validation`

Supersedes #188 (closed as a side-effect of a branch rename — all work is intact on this branch; only the PR wrapper was lost).

Status — WIP draft, not yet for review.

Foundation folded in. Sacha's sachaguer:develop (Merge Sacha's fork with fiducial sp_validation #192 head, 120 files: paper plots,
harmonic configs, library changes) merged; the cosmology.py KeyError: 'mnu' blocker fixed
(one line) — test_cosmology.py 26/26 green. Sacha's broad .gitignore bans
(*.png *.sh *.fits) were not adopted.

Back-pressure guard suite all green — characterization guards that must stay green as
files move: ① imports + standalone-scripts/ resolution, ② snakemake -n passes,
③ config-path existence (Candide-local), ⑤ symlink integrity, ⑥ dangling-reference grep.
Full suite 86 passed, 0 skipped; the move-map guard is active with five registered moves.

Phase 2 — the moves — COMPLETE. The tree now has the target top-level shape:

workflow/ + papers/{bmodes,catalog,harmonic} — the bmodes split (generic compute base
composed via Snakemake module; paper layers on top). pure_eb run dir repointed;
all_tapestry dry-runs cleanly from both locations.

cosmo_val/ promoted from notebooks/ (code + config home beside cosmo_inference/);
every tracked reference swept notebooks/cosmo_val → cosmo_val, on-disk outputs moved
along so candide-absolute paths stay live.

scratch/ added (tracked, per-person; conventions in its README), one top-level
results/ (contents gitignored, dir kept), root output/ ignored, the dead hand-listed
notebook block dropped from .gitignore.

Cleanup begun: defunct/ deleted; nbstripout + 2 MB large-file pre-commit hooks land
the bloat discipline (activation: pre-commit install, see CONTRIBUTING).

Remaining: fold glass_mock core into src/ (code-level refactor, own pass); curate
notebooks/ to official demos (which reduction notebooks become scripts/ — review with
Martin); branch/milestone tidy-up (Restructuring proposal: top-level layout + workflow/paper/scratch division #188/Foundation: merge pending local code into develop #189 closure) with Cail.

develop is untouched. Live state is tracked in the sp-validation-restructuring fiber.

— Claude on behalf of Cail

One organizing principle — the things you run live at the top — a clean three-way
split between analysis, papers, and scratch, and a modular workflow built for more than one
person.

The shape

Today cosmo_val is buried inside notebooks/ while cosmo_inference/ is top-level, so
you constantly hunt for where each one lives. The fix: the things a person actually runs
sit side by side at the top, sharing library code in src/ underneath.

sp_validation/
├── src/sp_validation/   library code (+ glass_mock core)
├── cosmo_val/           validation: code + config        (promoted from notebooks/)
├── cosmo_inference/     inference: code + config         (cosmosis / cosmocov)
├── workflow/            ALL analysis — modular Snakemake, multi-person → results/
├── papers/             final-figure assembly only (PDF, colour, layout)
├── scripts/            real reduction scripts (catalog builders, masking)
├── scratch/            per-person — ad hoc work + personal workflows (tracked)
├── notebooks/          curated to official demos / tutorials
├── results/            analysis products + diagnostic plots (contents gitignored)
└── docs/  tests/  config/

Division of labor

The boundary is the inputs to a paper figure: everything up to that point is analysis;
the figure itself is presentation.

workflow/ — all analysis. Generic, reusable, modular, organized for multiple people.
Produces analysis products and diagnostic plots (sp_validation makes many — they go to
results/). The bulk of the work lives here.
papers/<paper>/ — final-figure assembly only. The figure PDF, colours, layout,
recombining data for presentation. Tied to one paper, and may never touch Snakemake.
scratch/<person>/ — personal and ad hoc. Experiments and one-off custom workflows.
Tracked, because seeing each other's scratch is useful.

How the workflow scales — modular, not monolithic

Nothing in this analysis is computed once: the catalog changed ~20× in the first release
suite, and every paper varies the data vector, covariance, and inference. So the workflow
is parameterized — the rules are shared, the config changes each time. Snakemake's
module directive imports the rules under your own config and an output prefix, and lets
you override any single rule:

module analysis:
    snakefile: "../../workflow/Snakefile"
    config:    config              # this run's catalog, cuts, blind
    prefix:    "results/bmodes"    # products land here — no clobbering

use rule * from analysis
# swap is per-rule: redefine just the data-vector rule to override it

One top-level results/; each run namespaces under results/<name>/ via the prefix, so
people don't clobber each other. A --dry-run on each composition is the safety net that
lets the structure grow without silent breakage.

Cleanup

Delete defunct/ (quarantined since 2024) and the exploratory 2021–22 notebooks — it
all stays in git history.
Curate notebooks/ to official demos and tutorials; personal scratchy ones move to
scratch/.
Discipline via tooling, not bans: nbstripout strips notebook outputs on commit (the
repo's weight today is committed notebook outputs), plus a pre-commit size hook.
Path translation — collecting the paper dirs breaks ~35 hardcoded absolute paths; a
mechanical sweep rewrites them (scripts included) to the single repo-relative results/.

The milestone

A suite of PRs, in sequence:

Foundation — merge pending local code into develop. (Sacha — folded into this branch)
Restructuring — this PR: proposal + implementation behind the guard suite.
Glass mocks → tomography.
Input pipeline → tomography.

— Claude on behalf of Cail

…tation

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fold Sacha's pending foundation (PR #192 head, sachaguer:develop @ c22f075) onto current develop so the restructuring builds on his foundation without racing his merge gesture (Cail's direction, 2026-06-05). .gitignore conflict resolved in favour of develop: kept the .felt tracking block, rejected sacha's broad cluster bans (*.png *.sh *.fits *.out *.err) — those get narrowed during the restructuring gitignore pass, not adopted wholesale. cosmo_val.py / cat_config.yaml auto-merged cleanly (origin's docstring-RST polish + sacha's functional changes did not collide). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cosmology.py get_cosmo read planck_defaults["mnu"] but the dict never defined the key, so every bare get_cosmo() call (no ccl_params, no mnu arg) raised KeyError: 'mnu'. Add "mnu": PLANCK18["m_nu"] (0.06 eV). Verified: test_cosmology.py 26/26 pass (was immediate KeyError before). This is the one blocker that kept Sacha's foundation from running clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs/source/sp_validation.*.rst are regenerated on every docs build by sphinx-apidoc (deploy-docs.yml: `sphinx-apidoc -feTMo docs/source src/sp_validation`), matching the already-ignored fortuna.*/scripts.* stubs — they should never be committed. uv.lock: the container is the canonical runtime (CLAUDE.md), the lockfile has never been tracked, so ignore it rather than make an unowned pinned-dep commitment. One-line flip to track if we decide to pin. Establishes a clean base for the restructuring branch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Sacha's branch removed the cosmosis_pipeline_glass_mock_0*.ini and _v0*.ini ignore patterns, which un-ignored ~700 generated glass-mock pipeline configs in cosmo_inference/cosmosis_config/. Restore the two specific patterns (not broad bans) so the tree returns to develop's clean state. These are generated artifacts, never tracked. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pin metacal response-matrix recovery (injected finite-difference slopes -> R=[[2,0],[0,3]]; the 2*step normalization is load-bearing), the size/SNR boolean masks (strict-bound exclusion), and jackknif_weighted_average2 (seeded mean + error). Tight rtol 1e-10 with teeth (injected-slope change moves R; tightened bounds flip mask elements; a weight change moves the jackknife mean). Environment-independent (hand-built NGMIX catalog + seeded RNG). 5 tests, green in-container; adversarially reviewed clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…tion Pin the calibration linear algebra in get_calibrated_quantities and get_calibrated_m_c (g_corr = inv(R)@g_uncorr; c = mean, c_err = population std; g_corr_mc = g_corr - inv(R)@c) through a lightweight fake gal_metacal, at rtol 1e-12. Teeth: perturbing R moves g_corr; a constant ellipticity offset shifts c by exactly that offset; an inv(R)->R refactor breaks the pins. Environment-independent (in-memory SimpleNamespace; no cluster data). 4 tests, green in-container; adversarially reviewed (teeth confirmed). c_err pins CURRENT behavior (population std, ddof=0); the source carries a "use std of mean" TODO that would need re-baselining if acted on. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Redo test_calculate_pure_eb_runs_on_synthetic_catalog per the correction that the real analysis integrates over a BROADER range than it reports: widen the integration grid to [1,300] arcmin / nbins_int=600 so every reporting bin of the Schneider E/B decomposition is well-defined, and assert finiteness on ALL reporting bins (not just the interior). Add tight value-drift pins (rtol 1e-6, atol 1e-12) on the four deterministic mode vectors xip_E/xim_E/xip_B/xim_B (reproducible to 1.4e-11 across processes; covariance left shape-only since it depends on kmeans patch assignment). Teeth: a ~3% drift on any mode value fails, and coarsening nbins_int back toward ~80 reintroduces edge NaNs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Run nbstripout (0.8.1, the version pinned in .pre-commit-config.yaml) over all 51 tracked notebooks. Removes 13k+ lines of embedded output/execution state that accumulated while the pre-commit nbstripout hook was shadowed by the bd (beads) hook in .git/hooks/pre-commit. Notebooks are unchanged as code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…I fix Adding GLASS to the container image ran test_matter_maps_are_seed_deterministic for the first time, exposing that glass_mock.py's map path is incompatible with the installed (unpinned) glass+cosmology versions: cosmology.Cosmology.from_camb returns a CambCosmology without comoving_distance, which glass.distance_grid and MultiPlaneConvergence require. The map path had never been exercised (GLASS was absent from the production image until now). xfail (strict=False, raises= AttributeError) so CI is green and the gap is documented; the real fix is to pin a compatible glass+cosmology pair (or adapt the calls) and verify in the fresh image. See fiber glass-cosmology-api-pin. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Purge foreign paths from the dormant inference subsystem: the output root now derives from COSMO_INFERENCE (this repo's cosmo_inference, via common.py) and the shell rundir from a COSMO_INFERENCE_RUNDIR constant, instead of /home/guerrini and the deprecated pure_eb symlink; the external chain/mock dirs (/n09data/guerrini/*) become config keys (inference.chains_dir, .glass_mock_data_dir, .glass_mock_chains_dir). grep confirms zero guerrini/pure_eb/n09data paths remain in inference.smk. Resolve the pseudo-Cℓ filename schism (consumer adopts the producer's tagged name): pseudo_cl_assets() now requests pseudo_cl_{version}_blind={blind}_{binning}_nbins={nbins}.fits and the matching pseudo_cl_cov_ name, sourcing blind/binning/nbins from a new harmonic.fiducial config block (A/powspace/32); PSEUDO_CL_DIR fixed COSMO_VAL.parent -> COSMO_VAL so the DAG edge to the producer forms. Verified on disk that the producer's tagged files exist at exactly these names (the old bare name does not). Keys added to both papers/bmodes and papers/cosmo_val configs (both load the workflow). DAG verified: snakemake --list + inference_fiducial -n build cleanly (only the expected dormant cov_tau MissingInput); dry-run guard passes. Scope: data products only, no cosmosis chains. The FITS-CONTENT schema reconciliation (reader vs producer HDUs) remains for when the subsystem is revived. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Integration test for cosmo_inference/scripts/cosmosis_fitting.py (numpy + astropy only, fully exercisable in-container). Pins the HDU set for both the plain-xi and --use-rho-tau paths, the xi+ then xi- data-vector ordering, and the blocked-covariance offsets (STRT_0..3 = 0, N, 2N, 3N with the tau block truncated 3N->2N). Teeth confirmed via mutation testing (wrong offset/order/ truncation all turn it red). 9 tests, green in-container. Covers the riskiest, least-tested code in the inference data-product path; the harmonic-Cℓ augmentation path is left for a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…t guard git rm --cached the 27 catalog-paper plot PDFs (~13M) under papers/catalog/plots (no .tex in this repo references them — the paper TeX lives in the docs/ repo — so they are regenerable script/notebook outputs) and the 2 tracked glass_mock cosmosis_config .ini files (already matched by .gitignore but committed before the pattern existed). Both now gitignored so they won't reappear. Add test_no_stray_outputs.py: a structural guard asserting no tracked image/data outputs (png/pdf/fits/npy/...) live under any */scripts/ dir — back-pressure against the regression that once committed a 761 KB PNG beside a script. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…_mock Tracks the latent bug surfaced by adding GLASS to the container: glass_mock.py's map path needs a cosmology with comoving_distance that the installed glass+ cosmology pair doesn't provide. Map test xfail'd; fix = pin compatible versions, verify in the fresh sandbox, drop the xfail. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Drop the deprecated pure_eb/ compat-symlink prefix from COSMO_VAL, COSMO_INFERENCE, CAT_CONFIG, and CV_RUNDIR. Behavior-identical (same inode; pure_eb -> analyses/shear_2d/bmodes_2d -> code/sp_validation), but no longer depends on a symlink slated for removal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Resolves the glass_mock map-path AttributeError (Cosmology.from_camb -> CambCosmology lacks comoving_distance). glass 2025.1 is the unique version with the flat API the map path uses AND the legacy cosmo.dc/xm/ef interface that cosmology 2022.10.9 (its newest release) provides; matter_cls lives in the separate glass.ext.camb package. Verified under uv: the full map path runs and is seed-deterministic. Drop the test_glass_mock xfail once the container is rebuilt with the [glass] extra (gated on the sandbox swap). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Phase 1 of the lightcone reproduction: root unions analysis + 3 paper sub-analyses (II bmodes/Daley, III cosmic_shear_2d/Goh, IV harmonic/Guerrini), shared decisions hoisted to root, stop-at-inference. Decisions + claimed findings extracted from docs/unions_release TeX and adversarially verified; astra validate passes. Lives in scratch (Cail's), not papers/. See fiber unions-astra-reproduction; Phase 2 (review) next. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…reads Record on cosmo-val-workflow that the canonical validation set (SP_v1.4.6.3 +-leak_corr, npatch=100) is confirmed and include_pseudo_cl is on by default. File notebooks-cleanup, glass-mock-migration, and restructuring-docs under sp-validation-restructuring. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

Add short, structural READMEs to workflow/, papers/, cosmo_val/, scripts/, and results/, each explaining what belongs there and the boundary with its neighbors. The organizing idea is the inputs to a paper figure: analysis lives in workflow/, presentation in papers/. Allow results/README.md past the results/ gitignore. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

Add a Repository layout section to the top-level README and a matching repository_structure narrative page to the Sphinx docs, both carrying the target tree and the analysis-vs-presentation division of labor. Wire the new page into toc.rst (Getting Started) and link it from the index landing page. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

Remove all .ipynb exploratory notebooks (analysis precursors superseded by sp_validation modules and scripts/, preserved in git history), plus tests_bump.py (scratch) and misc.ipybn (junk, typo extension).

Lift the reusable functions out of the top-level glass_mock/ runner scripts into src/sp_validation/glass_mock.py, alongside the existing generation core: - downgrade_mask, ia_convergence + growth_factor (from make_unions_glass_sim) - create_mask_from_catalogue (from create_mask) - compute_two_point_xi / _cl / _cl_map, get_n_gal_map, TREECORR_CONFIG (from compute_two_point_stats_glass) - compute_leakage_harmony (from compute_leakage_harmony) - powspace_bins: factor out the NaMaster square-root bandpower binning that the harmonic stats and leakage paths duplicated Heavy deps (healpy/pymaster/treecorr/scipy/astropy) stay lazily imported so the module and its import guard resolve without the full GLASS stack. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

… notebook Per the restructuring principle (library in src/, runners in scripts/), move the four top-level glass_mock/ CLI scripts to scripts/glass_mock/ as thin wrappers that import their logic from sp_validation.glass_mock: - make_unions_glass_sim.py: keeps arg parsing, mask/sampling/FITS I/O; the Sky helper methods (downgrade_mask, IA/growth math) now call the library. - create_mask.py, compute_two_point_stats_glass.py, compute_leakage_harmony.py: reduced to argparse + I/O around the library functions. Delete the exploratory validate_glass_mock.ipynb (hardcoded personal paths, research scratch) — it stays in git history. The top-level glass_mock/ directory no longer exists. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

- glass_mock.py module docstring now points at scripts/glass_mock/ and notes the post-processing helpers it collects. - test_dangling_move_references.py: correct the now-stale comment that claimed the top-level glass_mock/ directory survives; it has been fully removed, but the bare string "glass_mock" stays live (workflow, glass_mocks config, results/ paths, mock filenames), so it is intentionally not registered as a retired path. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

Relocate demo, catalogue-reduction, and plotting scripts (jupytext/percent format) plus the params.py config template and des_y3 example from notebooks/ to scripts/examples/. notebooks/ is now removed entirely. - Group all demos and reduction scripts under scripts/examples/ to keep the doc-scanned top-level scripts/ limited to clean CLI tools. - Fix two paste-corruption syntax errors in demo_calibrate_minimal_cat.py so the file parses (matched unpacking/signature to calibrate_comprehensive_cat). - Apply ruff safe autofixes (whitespace, unused imports, import sorting, f-strings) to the moved files.

Background agents create transient isolated worktrees under .claude/worktrees/; they must never be tracked. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

Repoint params.py config references (CLAUDE.md, quickstart.rst, run_validation.md, post_processing.md, prepare_patch_for_spval.sh) at scripts/examples/params.py, and the extract_information.* reference at the moved scripts/examples/extract_info.py. Register notebooks/params.py -> scripts/examples/params.py in the dangling-move-references guard.

…amples/

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

All three parallel reorg threads integrated onto cleanup/restructuring. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

Convert the user-facing tutorial_UNIONS_SP_v1.0 notebook (removed in the notebooks/ cleanup) into a Sphinx User Guide page rather than deleting it. Updated for the HDF5 catalogue format (>= v1.4.1) and cross-referenced to sp_validation.calibration.get_calibrated_m_c / get_calibrate_e_from_cat, which now automate the hand-rolled metacalibration steps. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

sachaguer and others added 30 commits March 3, 2026 15:55

Update script to create the mock to be consistent with CosmoSIS

4ec270d

Update script to compute the two-point stats on mocks

273fc06

Update script to obtain nz txt files

66000cc

Update prior and values files

5760bda

Update gitignore

13bb58e

Push ini files

9142cd8

Update gitignore

e6ba6ee

Update get_chi2_notebooks

a6baa39

Update of theory b modes script and cosmo_val

cdc1c72

Update script to get harmonic space gausian sims

45e1745

Solve conflicts

dc6325b

Update cosmosis_fitting script

7674bef

Fix typo in the nstep computation

1d8e615

Update the theory computation in the gaussian sims

8297930

Fix bugs in cosmo_val after merge + add neutrinos to the theory compu…

b80dca2

…tation

Add script to perform the postprocessing of polychord chains

8839683

Add plotting scripts for unblinding

9218411

Finish the script

3b78d16

Add updates of Lisa for the unblinding party

16ac551

Update plotting notebooks and chain postprocessing.

36bf529

stub: foundation — merge pending local code into develop

aee9b3e

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Update paper plots scripts

a367c10

Update notebooks and file that has been modified

9f877a9

Add the config files for the likelihood in harmonic space

8aa8296

Remove the backup files

9f7a536

Merge branch 'stub/foundation-merge' into develop

c22f075

cailmdaley and others added 30 commits June 13, 2026 15:29

felt: sync fiber frontmatter (serialization round-trip)

e46802b

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

felt(glass-cosmology-api-pin): resolved — pin verified under uv

88f2134

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Delete exploratory notebooks and junk from notebooks/

cfbce19

Remove all .ipynb exploratory notebooks (analysis precursors superseded by sp_validation modules and scripts/, preserved in git history), plus tests_bump.py (scratch) and misc.ipybn (junk, typo extension).

chore: gitignore Claude Code agent worktrees (.claude/worktrees/)

7ef39a0

Background agents create transient isolated worktrees under .claude/worktrees/; they must never be tracked. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

Merge reorg/glass-mock-migration: fold glass_mock into src/ + scripts/

109bb4b

Merge reorg/notebooks-cleanup: delete/convert notebooks to scripts/ex…

ca1481e

…amples/

style: wrap long MOVE_MAP entry in dangling-ref guard (E501)

82168eb

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

felt: close notebooks-cleanup, glass-mock-migration, restructuring-docs

0881c7c

All three parallel reorg threads integrated onto cleanup/restructuring. https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

felt: reopen notebooks-cleanup — revisiting blanket .ipynb deletion

853e396

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructuring: top-level layout + workflow/papers/scratch division#197

Restructuring: top-level layout + workflow/papers/scratch division#197
cailmdaley wants to merge 109 commits into
developfrom
cleanup/restructuring

cailmdaley commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cailmdaley commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Restructuring sp_validation

The shape

Division of labor

How the workflow scales — modular, not monolithic

Cleanup

The milestone

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cailmdaley commented Jun 5, 2026 •

edited

Loading

Restructuring `sp_validation`