Skip to content

[EPIC] Documentation truth-up: make the docs describe what views-postprocessing actually is #70

Description

@Polichinel

Problem

The repo's orienting documentation describes a version of views-postprocessing that no longer exists. The README, ADR-001 (ontology), and ADR-002 (topology) were written 2026-06-02 around a runtime geopandas spatial mapper + bundled shapefiles as the stable core. That whole architecture was deleted (ADR-011 / C-39 / PR #42) and replaced by a precomputed GAUL lookup-join + a delivery/input-integrity layer. The docs never caught up.

A newcomer (or the maintainer) reading the docs today is actively misled:

  • README still documents PriogridCountryMapper, shapefiles, geopandas, disk/memory caching, and cachetools as core features and API — all deleted. It also carries a wrong APPWRITE_PROD_FORECASTS_COLLECTION_ID (the un_fao postmortem found forecasts_metadata does not exist in live Appwrite), a dead link to a deleted mapping/README.md, an incomplete output schema, and a stale dependency table/pin.
  • ADR-001 (ontology) lists "Geographic Data Assets" (shapefiles) and "Spatial Mapping Engine" as the two Authoritative / Stable core categories — both deleted. Still marked "Accepted," superseded-in-fact by ADR-011 with no note.
  • ADR-002 ("topology and dependency rules") names none of the sibling repos — it never explains the relationship to pipeline-core (the framework vpp inherits from), views-faoapi (the consumer), or views-datafactory (the producer).

The correct picture — the repo's role, the seams, the current package layout — exists only scattered across the risk register (C-40), code docstrings, and a views-models postmortem.

Why this matters

  • Docs are governance here (ADR-000, ADR-010). Stale governance is worse than none: it asserts deleted design as the stable core.
  • Real cost already paid: mapping out what the repo does and how it relates to the others took a multi-attempt cross-repo investigation (the un_fao pre/post-run postmortems) that a one-page orientation doc would have prevented.
  • The repo is mid-transition (input-integrity shipped; the frames/samples migration C-40 looms). Onboarding and decisions both need a correct baseline.

Desired end state

Reading the docs top-down — README → a role-&-seams orientation doc → ADRs → CICs — yields a correct, current, coherent picture:

  • what the repo is: a post-forecast delivery + input-integrity layer (not a spatial-mapping library, not a statistical postprocessor);
  • its role vs the sibling repos (datafactory → pipeline-core → vpp → faoapi), and that it is-a pipeline-core postprocessor (Template-Method subclass);
  • the seams: delivery/ representation-free invariants vs the unfao/extraction.py pandas seam; the inherited pandas base and the C-40 gate; the draws/frames contract and where collapse happens (downstream in faoapi);
  • the current package layout and what each module does;
  • no document describes deleted code as current.

Plus a lightweight guardrail that prevents the docs from silently re-staling.

Scope

In: README rewrite; ADR-001 supersession + replacement ontology; ADR-002 topology with real repos + seams; a new role-&-seams orientation doc; current package/module map; managers/README refresh; ADR index supersession notes; fix/flag the wrong collection-ID; remove dead links; a doc-accuracy guardrail.

Out: any code change (this epic is docs-only); the C-40 / frames migration itself; the operational data-correction procedure (#15 — separate, needs FAO input); CIC content (the GaulLookupEnricher + UNFAOPostProcessorManager CICs were refreshed in #66/#68 and are current); the historical cross_repo_integration_report.md and reconciliation_migration.md (correct as point-in-time snapshot / tombstone — leave as-is).

Stories (in order)

  1. S1 (planning / needs-decision) — Documentation information architecture + ADR-supersession policy. (root dependency)
  2. S2 (documentation) — New orientation doc: role & seams. (dep: S1)
  3. S3 (documentation) — Rewrite the README to current reality. (dep: S1, S2)
  4. S4 (documentation) — Truth-up the foundational ADRs (001 supersede, 002 topology, index). (dep: S1, S2)
  5. S5 (documentation) — Refresh module-level + package-layout docs; kill dead links. (dep: S3)
  6. S6 (testing) — Doc-accuracy guardrail (no references to deleted symbols; links resolve). (dep: S2–S5)

Dependency graph

S1 ─┬─► S2 ─┬─► S3 ──► S5 ─┐
    │       └─► S4         ├─► S6
    └───────────►          ┘

Epic-level acceptance criteria

  • No doc references deleted symbols (PriogridCountryMapper, mapping.py, shapefiles, geopandas, cachetools, disk/memory caching) as current.
  • A reader can answer, from the docs alone: what is this repo, what's its role vs pipeline-core / faoapi / datafactory, where are the seams, what does each module do.
  • ADR-001 is marked superseded (with its replacement) and ADR-002 names the sibling repos + dependency direction + seams.
  • README dependency table, env-var section (collection-ID flagged/corrected), package structure, and links are all accurate; no dead links.
  • The doc-accuracy guardrail runs (CI or docs/validate_docs.sh) and is green.
  • All stories closed; tracking issue complete.

Investigation notes (grounding)

  • Mapper deletion: ADR-011, C-39, PR Delete the dead geopandas runtime mapper (C-39) #42. Lookup lives at views_postprocessing/data/gaul_lookup.parquet (GaulLookupEnricher).
  • Current package: views_postprocessing/delivery/{coverage,identity,observed_range,provenance}.py + views_postprocessing/unfao/{extraction,enrichment,gaul_schema,frames,source_metadata}.py + unfao/managers/unfao.py.
  • Role/seams source material: register C-40, the un_fao pre/post-run postmortems (views-models reports/).
  • Correct-as-is docs to leave alone: docs/reconciliation_migration.md (tombstone), docs/cross_repo_integration_report.md (2026-06-12 snapshot).

Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationepicA large capability spanning multiple stories

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions