From b54584be3f86c571c585db1f1377819c2266a168 Mon Sep 17 00:00:00 2001 From: Polichinl Date: Sat, 27 Jun 2026 19:05:18 +0200 Subject: [PATCH] =?UTF-8?q?docs(readme):=20S3=20=E2=80=94=20rewrite=20the?= =?UTF-8?q?=20README=20to=20current=20reality=20(#73)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The README described the deleted runtime spatial mapper as the repo's core: PriogridCountryMapper, shapefiles, geopandas, disk/memory caching, cachetools β€” all removed (ADR-011 / C-39). It also carried a wrong PROD_FORECASTS_COLLECTION_ID, a dead link to a deleted mapping/README, an incomplete 7-column output schema, and a stale dependency table. Rewritten to current reality: a post-forecast delivery layer (a pipeline-core postprocessor) that GAUL-enriches via lookup-join and enforces input-integrity guards. Correct dependency table (drops cachetools), real package structure, the full 9-column schema, the env-var collection-ID flagged as needs-verify, pipeline stages incl. validate/coverage/clip, and a prominent pointer to the S2 orientation doc. No deleted-symbol references; all relative links resolve. Co-Authored-By: Claude Opus 4.8 --- README.md | 391 ++++++++++++++++-------------------------------------- 1 file changed, 114 insertions(+), 277 deletions(-) diff --git a/README.md b/README.md index 718b83c..9c505f6 100644 --- a/README.md +++ b/README.md @@ -4,355 +4,192 @@ [![Poetry](https://img.shields.io/badge/dependency%20management-poetry-blueviolet)](https://python-poetry.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -A modular postprocessing framework for the **VIEWS** (Violence Early-Warning System) pipeline. This package provides tools for enriching conflict prediction data with geographic metadata, transforming outputs for partner organizations, and managing spatial mappings between PRIO-GRID cells and administrative boundaries. +The **post-forecast delivery layer** for the **VIEWS** (Violence Early-Warning System) +pipeline. It takes finished VIEWS forecasts, enriches them with geographic metadata, +guards their integrity, and delivers them to a partner store. ---- - -## Table of Contents - -- [Overview](#overview) -- [Features](#features) -- [Installation](#installation) -- [Package Structure](#package-structure) -- [Modules](#modules) - - [UNFAO Postprocessor](#unfao-postprocessor) - - [PRIO-GRID Spatial Mapping](#prio-grid-spatial-mapping) -- [Shapefiles](#shapefiles) -- [Quick Start](#quick-start) -- [Configuration](#configuration) -- [API Reference](#api-reference) -- [Contributing](#contributing) -- [License](#license) - ---- - -## Overview +The only live delivery today is the **UN FAO** path (`views_postprocessing/unfao/`). -The VIEWS platform generates conflict predictions at the **PRIO-GRID** levelβ€”a standardized global grid system with ~50Γ—50 km cells. Partner organizations like the **UN Food and Agriculture Organization (FAO)** require this data enriched with administrative metadata (country codes, province names, coordinates) for operational use. - -`views-postprocessing` bridges this gap by providing: - -1. **Postprocessor Managers** - Pipeline components that read, transform, validate, and deliver prediction data -2. **Spatial Mapping Tools** - Bidirectional mapping between PRIO-GRID cells and multi-level administrative boundaries -3. **Geographic Enrichment** - Automatic addition of coordinates, ISO codes, and GAUL boundary identifiers +> **New here? Read [`docs/architecture/role_and_seams.md`](docs/architecture/role_and_seams.md) first.** +> It explains what this repo is, how it relates to pipeline-core / faoapi / datafactory, +> and where its internal seams are. This README is install + quickstart only. --- -## Features +## What this is (and isn't) -- πŸ—ΊοΈ **Multi-level Administrative Mapping** - Map PRIO-GRID cells to countries, Admin Level 1 (provinces), and Admin Level 2 (districts) -- ⚑ **High-Performance Caching** - Disk-based and in-memory LRU caching for spatial operations -- πŸ”„ **Pipeline Integration** - Seamless integration with `views-pipeline-core` managers -- πŸ“¦ **Appwrite Integration** - Read from and write to Appwrite cloud storage buckets -- 🌍 **Comprehensive Shapefiles** - Bundled Natural Earth and GAUL 2024 boundary data -- βœ… **Schema Validation** - Automatic validation of output data schemas +- **It is** a concrete **pipeline-core postprocessor**: `UNFAOPostProcessorManager` + subclasses pipeline-core's `PostprocessorManager` and fills the post-forecast lifecycle + (`read β†’ transform β†’ validate β†’ save`) for the FAO partner. +- **It enriches and delivers** β€” it joins GAUL administrative metadata onto predictions + (via a precomputed lookup, **ADR-011**) and enforces input-integrity invariants before + upload. +- **It is *not*** a spatial-mapping library (the old runtime spatial mapper was removed β€” + see ADR-011 / C-39) and **not** a statistical post-processor. Draw-collapse (MAP/HDI) happens + downstream in views-faoapi; reconciliation lives in `views_frames_reconcile`. See the + orientation doc. --- ## Installation -### Using Poetry (recommended) - ```bash -# Clone the repository -git clone https://github.com/prio-data/views-postprocessing.git +# With Poetry (recommended) +git clone https://github.com/views-platform/views-postprocessing.git cd views-postprocessing - -# Install with Poetry poetry install -``` -### Using pip - -```bash +# Or with pip pip install views-postprocessing ``` -### Dependencies - -| Package | Version | Description | -|---------|---------|-------------| -| `views-pipeline-core` | >=2.1.3,<3.0.0 | Core pipeline managers and utilities | -| `cachetools` | ==6.2.1 | LRU and TTL caching for spatial lookups | - -**Note:** This package requires Python 3.11 or higher (compatible up to 3.15). +Requires **Python 3.11–3.14**. ---- - -## Package Structure +### Dependencies -``` -views-postprocessing/ -β”œβ”€β”€ pyproject.toml # Package configuration -β”œβ”€β”€ README.md # This file -└── views_postprocessing/ - β”œβ”€β”€ shapefiles/ # Bundled geographic data - β”‚ β”œβ”€β”€ GAUL_2024_L1/ # Admin Level 1 boundaries - β”‚ β”œβ”€β”€ GAUL_2024_L2/ # Admin Level 2 boundaries - β”‚ β”œβ”€β”€ ne_10m_admin_0_countries/ # Natural Earth countries (10m) - β”‚ β”œβ”€β”€ ne_110m_admin_0_countries/ # Natural Earth countries (110m) - β”‚ └── priogrid_cellshp/ # PRIO-GRID cell geometries - └── unfao/ # UN FAO-specific module - β”œβ”€β”€ managers/ - β”‚ β”œβ”€β”€ unfao.py # UNFAOPostProcessorManager - β”‚ └── README.md # Manager documentation - └── mapping/ - β”œβ”€β”€ mapping.py # PriogridCountryMapper - └── README.md # Mapping documentation -``` +| Package | Version | Why | +|---------|---------|-----| +| `views-pipeline-core` | `>=2.1.3,<3.0.0` | The framework: lifecycle base classes, data loader, dataset container, Appwrite/datastore tools | +| `views-frames` | `>=1.0,<2` | The frame data contract (used by the conformance adapter; the live path is still pandas β€” see C-40) | --- -## Modules - -### UNFAO Postprocessor - -The `UNFAOPostProcessorManager` transforms VIEWS predictions for UN FAO consumption: +## The UN FAO delivery ```python from views_pipeline_core.managers.postprocessor import PostprocessorPathManager from views_postprocessing.unfao.managers.unfao import UNFAOPostProcessorManager -# Initialize path_manager = PostprocessorPathManager("un_fao") -manager = UNFAOPostProcessorManager( - model_path=path_manager, - wandb_notifications=True -) - -# Execute full pipeline -manager.execute() +manager = UNFAOPostProcessorManager(model_path=path_manager) +manager.execute() # read β†’ transform β†’ validate β†’ save ``` -#### Pipeline Stages +In practice the manager is constructed and run by **views-models** +(`postprocessors/un_fao/main.py`), not invoked directly. + +### Pipeline stages -| Stage | Method | Description | -|-------|--------|-------------| -| **Read** | `_read()` | Fetches historical data from ViewsER and forecast data from Appwrite | -| **Transform** | `_transform()` | Enriches data with geographic metadata using `PriogridCountryMapper` | -| **Validate** | `_validate()` | Ensures schema compliance and required columns | -| **Save** | `_save()` | Saves to local parquet and uploads to UN FAO Appwrite bucket | +| Stage | Method(s) | What happens | +|-------|-----------|--------------| +| **Read** | `_read_historical_data`, `_read_forecast_data` | Historical actuals from views-datafactory (via the inherited loader); the forecast file from the Appwrite prediction store. The forecast file's identity is checked before use (C-25). | +| **Transform** | `_transform` β†’ `_append_metadata` | Joins the 9 GAUL metadata columns onto each frame via `GaulLookupEnricher` (a parquet lookup). Prediction values are **not** transformed. | +| **Validate** | `_validate`, `_check_coverage` | Null-gate on the metadata columns; region coverage + GAUL-excluded-cell guards (C-34 / C-30). | +| **Clip** | `_clip_observed_history` | Drops fabricated zero-padded tail months from the historical actuals (C-26); the forecast is untouched. | +| **Save** | `_save` | Writes parquet and uploads to the UN FAO bucket with structured provenance (C-15). | -#### Output Schema +### Output schema (geographic metadata columns) -The postprocessor enriches data with these columns: +`GaulLookupEnricher` adds these 9 columns (the contract in `unfao/gaul_schema.py`): | Column | Type | Description | |--------|------|-------------| -| `pg_xcoord` | float | PRIO-GRID cell centroid X coordinate (longitude) | -| `pg_ycoord` | float | PRIO-GRID cell centroid Y coordinate (latitude) | +| `pg_xcoord` | float | PRIO-GRID cell centroid longitude | +| `pg_ycoord` | float | PRIO-GRID cell centroid latitude | | `country_iso_a3` | str | ISO 3166-1 alpha-3 country code | -| `admin1_gaul1_code` | int | GAUL Level 1 administrative code | -| `admin1_gaul1_name` | str | GAUL Level 1 administrative name | -| `admin2_gaul2_code` | int | GAUL Level 2 administrative code | -| `admin2_gaul2_name` | str | GAUL Level 2 administrative name | - ---- - -### PRIO-GRID Spatial Mapping - -The `PriogridCountryMapper` class provides comprehensive spatial mapping capabilities: - -```python -from views_postprocessing.unfao.mapping.mapping import PriogridCountryMapper - -# Initialize with disk caching -mapper = PriogridCountryMapper( - use_disk_cache=True, - cache_dir="~/.priogrid_mapper_cache", - cache_ttl=86400 * 7 # 7 days -) - -# Single cell lookup -country = mapper.find_country_for_gid(123456) -print(f"Country: {country}") # e.g., "TZA" - -# Find all PRIO-GRID cells in a country -gids = mapper.find_gids_for_country("NGA") -print(f"Nigeria has {len(gids)} PRIO-GRID cells") - -# Admin boundary lookups -admin1_info = mapper.find_admin1_for_gid(123456) -admin2_info = mapper.find_admin2_for_gid(123456) - -# Batch processing -gid_list = [123456, 123457, 123458, 123459] -countries = mapper.batch_country_mapping(gid_list) - -# DataFrame enrichment -enriched_df = mapper.enrich_dataframe_with_pg_info(df, gid_column="priogrid_gid") -``` - -#### Mapping Decision Logic - -The mapper uses a **largest overlap** algorithm to handle cells spanning multiple boundaries: - -1. Find all administrative regions intersecting the grid cell -2. Calculate overlap ratio for each region -3. Assign to the region with the largest overlap - -This provides deterministic, reproducible results even for border cells. - -#### Key Methods - -| Method | Description | -|--------|-------------| -| `find_country_for_gid(gid)` | Get ISO A3 country code for a PRIO-GRID cell | -| `find_gids_for_country(iso_a3)` | Get all PRIO-GRID cells within a country | -| `find_admin1_for_gid(gid)` | Get GAUL Level 1 info for a cell | -| `find_admin2_for_gid(gid)` | Get GAUL Level 2 info for a cell | -| `batch_country_mapping(gids)` | Map multiple cells efficiently | -| `batch_country_mapping_parallel(gids)` | Parallel batch mapping | -| `enrich_dataframe_with_pg_info(df)` | Add all geographic columns to a DataFrame | -| `get_all_countries()` | Get list of all available countries | -| `get_all_country_ids()` | Get list of all country ISO codes | -| `get_all_priogrids()` | Get all PRIO-GRID cell data | -| `get_all_priogrid_ids()` | Get list of all PRIO-GRID GIDs | - ---- - -## Shapefiles - -The package bundles essential geographic datasets: - -| Dataset | Resolution | Source | Use Case | -|---------|------------|--------|----------| -| **Natural Earth Countries (110m)** | 110m | Natural Earth | Fast country lookups | -| **Natural Earth Countries (10m)** | 10m | Natural Earth | Precise country lookups | -| **PRIO-GRID Cells** | 0.5Β° Γ— 0.5Β° | PRIO | Grid cell geometries | -| **GAUL Level 1** | - | FAO GAUL 2024 | Province/state boundaries | -| **GAUL Level 2** | - | FAO GAUL 2024 | District/county boundaries | - -All shapefiles use **EPSG:4326 (WGS84)** coordinate reference system. +| `admin1_gaul0_code` | int | GAUL level-0 (country) code | +| `admin1_gaul0_name` | str | GAUL level-0 (country) name | +| `admin1_gaul1_code` | int | GAUL level-1 (province) code | +| `admin1_gaul1_name` | str | GAUL level-1 (province) name | +| `admin2_gaul2_code` | int | GAUL level-2 (district) code | +| `admin2_gaul2_name` | str | GAUL level-2 (district) name | --- -## Quick Start - -### Basic Postprocessing - -```python -from views_pipeline_core.managers.postprocessor import PostprocessorPathManager -from views_postprocessing.unfao.managers.unfao import UNFAOPostProcessorManager - -# Set up the manager -path_manager = PostprocessorPathManager("un_fao") -manager = UNFAOPostProcessorManager(model_path=path_manager) +## Package structure -# Run the complete pipeline -manager.execute() ``` - -### Standalone Spatial Mapping - -```python -from views_postprocessing.unfao.mapping.mapping import PriogridCountryMapper -import pandas as pd - -# Initialize mapper -mapper = PriogridCountryMapper(use_disk_cache=True) - -# Create sample data -df = pd.DataFrame({ - "priogrid_gid": [123456, 123457, 123458], - "prediction": [0.05, 0.12, 0.08] -}) - -# Enrich with geographic metadata -enriched = mapper.enrich_dataframe_with_pg_info(df, gid_column="priogrid_gid") -print(enriched.columns) -# Index(['priogrid_gid', 'prediction', 'pg_xcoord', 'pg_ycoord', -# 'country_iso_a3', 'admin1_gaul1_code', 'admin1_gaul1_name', -# 'admin2_gaul2_code', 'admin2_gaul2_name'], dtype='object') +views-postprocessing/ +β”œβ”€β”€ pyproject.toml +β”œβ”€β”€ README.md +β”œβ”€β”€ docs/ +β”‚ β”œβ”€β”€ architecture/role_and_seams.md # READ FIRST β€” role + seams +β”‚ β”œβ”€β”€ ADRs/ # decisions + rationale +β”‚ └── CICs/ # class-level contracts +└── views_postprocessing/ + β”œβ”€β”€ delivery/ # representation-free integrity invariants (no pandas) + β”‚ β”œβ”€β”€ coverage.py # region cell-count + excluded-cell guards + β”‚ β”œβ”€β”€ identity.py # forecast-file identity guard + β”‚ β”œβ”€β”€ observed_range.py # fabricated-month decision + β”‚ └── provenance.py # structured upload provenance + β”œβ”€β”€ unfao/ # FAO-specific delivery + β”‚ β”œβ”€β”€ extraction.py # the pandas β†’ primitives seam + β”‚ β”œβ”€β”€ enrichment.py # GaulLookupEnricher (the GAUL join) + β”‚ β”œβ”€β”€ gaul_schema.py # the 9-column metadata contract + β”‚ β”œβ”€β”€ source_metadata.py # producer (datafactory) data-facts + β”‚ β”œβ”€β”€ frames.py # views-frames conformance adapter (not on the live path) + β”‚ └── managers/unfao.py # UNFAOPostProcessorManager + └── data/gaul_lookup.parquet # the precomputed GAUL lookup (ADR-011) ``` --- ## Configuration -### Environment Variables - -For Appwrite integration, configure these in your `.env` file: +The FAO delivery reads Appwrite connection settings from the environment (the manager calls +`os.getenv` β€” there is no startup validation yet, tracked in #11): ```bash -# Appwrite Connection +# Appwrite connection (secrets) APPWRITE_ENDPOINT=https://cloud.appwrite.io/v1 -APPWRITE_DATASTORE_PROJECT_ID=your_project_id -APPWRITE_DATASTORE_API_KEY=your_api_key +APPWRITE_DATASTORE_PROJECT_ID=... +APPWRITE_DATASTORE_API_KEY=... -# Production Forecasts Bucket (Input) +# Production-forecasts store (input) APPWRITE_PROD_FORECASTS_BUCKET_ID=production_forecasts APPWRITE_PROD_FORECASTS_BUCKET_NAME=Production Forecasts -APPWRITE_PROD_FORECASTS_COLLECTION_ID=forecasts_metadata - -# UN FAO Bucket (Output) -APPWRITE_UNFAO_BUCKET_ID=unfao_data -APPWRITE_UNFAO_BUCKET_NAME=UN FAO Data -APPWRITE_UNFAO_COLLECTION_ID=unfao_metadata - -# Metadata Database -APPWRITE_METADATA_DATABASE_ID=file_metadata -APPWRITE_METADATA_DATABASE_NAME=File Metadata -``` - -### Caching Configuration - -```python -# Disk caching (persistent across sessions) -mapper = PriogridCountryMapper( - use_disk_cache=True, - cache_dir="/path/to/cache", # Default: ~/.priogrid_mapper_cache - cache_ttl=604800 # 7 days in seconds -) - -# Memory-only caching (faster, but not persistent) -mapper = PriogridCountryMapper( - use_disk_cache=False -) +APPWRITE_PROD_FORECASTS_COLLECTION_ID=... # TODO: verify against live Appwrite β€” the + # previously-documented `forecasts_metadata` + # was found NOT to exist (un_fao postmortem) +APPWRITE_PROD_FORECASTS_COLLECTION_NAME=Production Forecasts + +# UN FAO store (output) +APPWRITE_UNFAO_BUCKET_ID=... +APPWRITE_UNFAO_BUCKET_NAME=... +APPWRITE_UNFAO_COLLECTION_ID=... +APPWRITE_UNFAO_COLLECTION_NAME=... + +# Metadata database +APPWRITE_METADATA_DATABASE_ID=... +APPWRITE_METADATA_DATABASE_NAME=... ``` --- -## API Reference - -For detailed API documentation, see the module-specific README files: +## Documentation -- [UNFAO Manager Documentation](views_postprocessing/unfao/managers/README.md) -- [PRIO-GRID Mapping Documentation](views_postprocessing/unfao/mapping/README.md) +| Doc | What it covers | +|-----|----------------| +| [`docs/architecture/role_and_seams.md`](docs/architecture/role_and_seams.md) | **Start here** β€” role vs the sibling repos + internal seams | +| [`docs/ADRs/`](docs/ADRs/) | Architecture decisions (esp. ADR-011 mapperβ†’lookup; ADR-012 ontology) | +| [`docs/CICs/`](docs/CICs/) | Class intent contracts (`UNFAOPostProcessorManager`, `GaulLookupEnricher`) | +| `reports/technical_risk_register.md` | Tracked risks (C-40 the pandas gate, C-25/C-30/C-15 the delivery guards) | --- ## Contributing -Contributions are welcome! Please follow these steps: - -1. Fork the repository -2. Create a feature branch (`git checkout -b feature/amazing-feature`) -3. Commit your changes (`git commit -m 'Add amazing feature'`) -4. Push to the branch (`git push origin feature/amazing-feature`) -5. Open a Pull Request - -### Development Setup +1. Branch off `development`. +2. Make the change; keep `ruff` and the test suite green (`ruff check . && PYTHONPATH=. pytest -q`). +3. Open a PR into `development`. -```bash -# Clone and install in development mode -git clone https://github.com/prio-data/views-postprocessing.git -cd views-postprocessing -poetry install -``` +Contributor protocols (incl. the conventions for AI agents) are under +[`docs/contributor_protocols/`](docs/contributor_protocols/). --- ## License -This project is part of the VIEWS platform developed by the **Peace Research Institute Oslo (PRIO)**. See the [LICENSE](LICENSE) file for details. +MIT β€” part of the VIEWS platform developed at the **Peace Research Institute Oslo (PRIO)**. +See [LICENSE](LICENSE). --- -## Related Packages - -| Package | Description | -|---------|-------------| -| [`views-pipeline-core`](https://github.com/views-platform/views-pipeline-core) | Core pipeline managers and utilities | +## Related packages ---- \ No newline at end of file +| Package | Role | +|---------|------| +| [`views-pipeline-core`](https://github.com/views-platform/views-pipeline-core) | The framework this repo extends | +| [`views-datafactory`](https://github.com/views-platform/views-datafactory) | Produces the data this repo consumes | +| [`views-faoapi`](https://github.com/views-platform/views-faoapi) | Serves the delivered FAO data (and collapses draws) | +| [`views-frames`](https://github.com/views-platform/views-frames) | The frame data contract + `views_frames_summarize` / `views_frames_reconcile` |