diff --git a/DEPLOY_MODEL.md b/DEPLOY_MODEL.md new file mode 100644 index 0000000..e448a4e --- /dev/null +++ b/DEPLOY_MODEL.md @@ -0,0 +1,95 @@ +# Deploying a Trained Model to ClimateVision + +End-to-end checklist to take a model from training (Colab) to live on +**climatevision.green**. Do **flooding** first; deforestation and ice follow the +same steps with their own datasets. + +--- + +## A. Train (Google Colab, GPU runtime) + +Use `notebooks/flood_training_colab.ipynb` (Run all), or run the steps manually: + +1. **Setup** — clone the repo, `pip install -r requirements.txt`, `pip install -e .`, confirm `torch.cuda.is_available()`. +2. **Download data** — `gcloud storage cp --recursive gs://sen1floods11/v1.1/data/flood_events/HandLabeled/{S2Hand,LabelHand}` and `.../splits` into `data/sen1floods11/`. +3. **Convert** — + ```bash + python scripts/prepare_sen1floods11.py \ + --s2-dir data/sen1floods11/S2Hand \ + --label-dir data/sen1floods11/LabelHand \ + --splits-dir data/sen1floods11/splits/flood_handlabeled \ + --out-dir data/datasets/flooding + ``` +4. **Train** — + ```bash + python scripts/train_real.py --analysis-type flooding \ + --data-dir data/datasets/flooding \ + --epochs 50 --batch-size 8 --image-size 256 --out models + ``` + Watch `val_iou` rise. Output: `models/flooding_/best_model.pth`. + +## B. Validate before promoting + +5. **Evaluate** — `python scripts/evaluate.py --checkpoint /best_model.pth --data-dir data/datasets/flooding` +6. **Governance gate** — `python scripts/governance_ci_gate.py`. **Only promote a model that passes.** This is the line between a "preview" and something an agency can act on. +7. **Model card** — `python scripts/generate_model_card.py --checkpoint /best_model.pth` (records metrics + provenance for the audit trail). + +## C. Export + +8. **ONNX** — `python scripts/export_model.py --checkpoint /best_model.pth` + produces `/model.onnx` (+ quantized + `export_info.json`). + The API auto-serves this: `inference/pipeline.py` loads a `.pth` if present, + else `models/_*/model.onnx` via onnxruntime. + +## D. Get the model into the repo / image + +Weights are **not** kept in git history by default, but the ONNX run dirs are +small (a few MB) and **are** committed so Render (which builds from GitHub) can +ship them. + +9. Download `best_model.pth` and `model.onnx` from Colab. +10. Place them on your laptop under `models/flooding_/`. +11. Commit + push: + ```bash + git add models/flooding_/ + git commit -m "feat(models): trained Sen1Floods11 flood model (val_iou=)" + git push origin main + ``` + +> Alternative (large weights): instead of committing, put the files on the Fly +> volume at `/app/outputs` and point `config.yaml` `weights:` at them. + +## E. Deploy + +12. The push triggers a Render rebuild (or click **Sync** on the blueprint). +13. Confirm secrets are set once: `render env list climatevision-green` — + `GEE_SERVICE_ACCOUNT_KEY_JSON`, `GEE_PROJECT_ID`, `CLIMATEVISION_ALLOW_DEV_KEY=0`, + `CLIMATEVISION_CORS_ORIGINS` including `https://climatevision.green`. + +## F. Verify it's real + +14. Health + model check: + ```bash + curl -s https://climatevision.green/api/health | jq + curl -s https://climatevision.green/api/health/models | jq + ``` +15. Confirm auth is locked down (cv_dev must be rejected): + ```bash + curl -s -H "X-API-Key: cv_dev" https://climatevision.green/api/runs # expect 401 + ``` +16. When `/api/health/models` reports a real loaded model (not demo/untrained), + **remove the "technical preview" label** from the UI/API — you are now + serving genuine predictions. + +--- + +## Per-type status + +| Analysis type | Dataset | Status | +|---------------|---------|--------| +| flooding | Sen1Floods11 | first target — follow this doc | +| deforestation | MultiEarth Amazon | same steps, `--analysis-type deforestation` | +| ice_melting | AI4Arctic v2 | same steps, `--analysis-type ice_melting` | + +Keep each model's `provenance.json`, model card, and metrics — that audit trail +is what makes the platform credible for NGO and government use. diff --git a/notebooks/flood_training_colab.ipynb b/notebooks/flood_training_colab.ipynb new file mode 100644 index 0000000..3df439a --- /dev/null +++ b/notebooks/flood_training_colab.ipynb @@ -0,0 +1,252 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ClimateVision \u2014 Flood Model Training (Sen1Floods11)\n", + "\n", + "Run top to bottom on a **GPU runtime** (Runtime \u2192 Change runtime type \u2192 GPU).\n", + "This trains a real flood-detection U-Net and exports it for deployment.\n", + "\n", + "**Prerequisite:** the latest scripts must be on `main` (push from your laptop first):\n", + "`download_datasets.py`, `prepare_sen1floods11.py`, `train_real.py`, and the `dataset.py` change.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Setup \u2014 clone repo, install, check GPU\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "!git clone https://github.com/Climate-Vision/ClimateVision.git\n", + "%cd ClimateVision\n", + "!git pull origin main # make sure newest scripts are present\n", + "!pip install -q -r requirements.txt\n", + "!pip install -q -e .\n", + "import torch; print('CUDA:', torch.cuda.is_available(), '| GPU:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'NONE')\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "# Sanity check the scripts exist (fail early if not pushed yet)\n", + "import os\n", + "need = ['scripts/prepare_sen1floods11.py','scripts/train_real.py','scripts/export_model.py']\n", + "missing = [p for p in need if not os.path.exists(p)]\n", + "assert not missing, f'Missing (push these first): {missing}'\n", + "print('All scripts present.')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Persist to Google Drive (so checkpoints survive a runtime reset)\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "from google.colab import drive\n", + "drive.mount('/content/drive')\n", + "!mkdir -p /content/drive/MyDrive/climatevision/models\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Authenticate to Google Cloud (for the Sen1Floods11 bucket)\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "auth.authenticate_user()\n", + "PROJECT = 'kinos-473422' # your GCP/GEE project\n", + "!gcloud config set project {PROJECT}\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Download Sen1Floods11 hand-labeled data (~hundreds of chips)\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "!mkdir -p data/sen1floods11\n", + "!gcloud storage cp --recursive gs://sen1floods11/v1.1/data/flood_events/HandLabeled/S2Hand data/sen1floods11/\n", + "!gcloud storage cp --recursive gs://sen1floods11/v1.1/data/flood_events/HandLabeled/LabelHand data/sen1floods11/\n", + "!gcloud storage cp --recursive gs://sen1floods11/v1.1/splits data/sen1floods11/\n", + "print('S2:', len(os.listdir('data/sen1floods11/S2Hand')), '| Labels:', len(os.listdir('data/sen1floods11/LabelHand')))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Convert to the ClimateVision training layout\n", + "Extracts S2 bands B03,B08,B11 and pairs masks. Add `--jrc-dir` only for the 3-class variant.\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "!python scripts/prepare_sen1floods11.py \\\n", + " --s2-dir data/sen1floods11/S2Hand \\\n", + " --label-dir data/sen1floods11/LabelHand \\\n", + " --splits-dir data/sen1floods11/splits \\\n", + " --out-dir data/datasets/flooding\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Train\n", + "Watch `val_iou` climb. Early stopping is automatic. ~1\u20133h depending on GPU.\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "!python scripts/train_real.py --analysis-type flooding \\\n", + " --data-dir data/datasets/flooding \\\n", + " --epochs 50 --batch-size 8 --image-size 256 \\\n", + " --out /content/drive/MyDrive/climatevision/models\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Locate the best checkpoint\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "import glob\n", + "runs = sorted(glob.glob('/content/drive/MyDrive/climatevision/models/flooding_*/best_model.pth'))\n", + "assert runs, 'No checkpoint found \u2014 did training finish?'\n", + "CKPT = runs[-1]; print('Best checkpoint:', CKPT)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8. Evaluate + governance gate (promote only if it passes)\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "!python scripts/evaluate.py --checkpoint \"$CKPT\" --data-dir data/datasets/flooding || true\n", + "!python scripts/governance_ci_gate.py || true\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9. Export to ONNX (what the API serves)\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "!python scripts/export_model.py --checkpoint \"$CKPT\"\n", + "import os; d=os.path.dirname(CKPT); print('Artifacts in', d, '->', os.listdir(d))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10. Download the model, then deploy\n", + "\n", + "Download `best_model.pth` and `model.onnx` from the run folder above, then on your laptop:\n", + "```bash\n", + "cp /* models/flooding_/\n", + "git add models/flooding_/\n", + "git commit -m 'feat(models): trained Sen1Floods11 flood model'\n", + "git push origin main # triggers Render rebuild\n", + "```\n", + "Verify it's live, then drop the preview label:\n", + "```bash\n", + "curl -s https://climatevision.green/api/health/models | jq\n", + "```\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "from google.colab import files\n", + "import os\n", + "d = os.path.dirname(CKPT)\n", + "for f in ['best_model.pth','model.onnx']:\n", + " p = os.path.join(d,f)\n", + " if os.path.exists(p): files.download(p)\n" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/scripts/prepare_sen1floods11.py b/scripts/prepare_sen1floods11.py new file mode 100644 index 0000000..21153d6 --- /dev/null +++ b/scripts/prepare_sen1floods11.py @@ -0,0 +1,169 @@ +""" +Convert the Sen1Floods11 hand-labeled data into the ClimateVision training layout. + +Input (downloaded from gs://sen1floods11/v1.1/data/flood_events/HandLabeled/): + S2Hand/ EVENT_CHIPID_S2Hand.tif 13-band Sentinel-2 L1C (TOA * 10000) + LabelHand/ EVENT_CHIPID_LabelHand.tif 1-band labels: -1 no-data, 0 not-water, 1 water + (optional) JRCWaterHand/ EVENT_CHIPID_JRCWaterHand.tif permanent-water reference + +Output (what scripts/train_real.py + data/dataset.py expect): + /train|val|test/ + images/ EVENT_CHIPID.tif 3-band float32 (config bands: B03,B08,B11) + masks/ EVENT_CHIPID.tif 1-band uint8 class indices + +Class mapping (matches config.yaml flooding: dry_land=0, permanent_water=1, flooded=2): + - binary mode (default): not-water/no-data -> 0 (dry), water -> 1. + Class 2 is simply unused; the 3-class model still trains and serves correctly. + - 3-class mode (--jrc-dir given): water AND permanent -> 1, water AND NOT permanent -> 2. + +Splits: uses the official CSVs in splits/ when present (so results are comparable to +published numbers); otherwise falls back to a deterministic 80/10/10 split. + +Usage: + python scripts/prepare_sen1floods11.py \ + --s2-dir data/sen1floods11/S2Hand \ + --label-dir data/sen1floods11/LabelHand \ + --splits-dir data/sen1floods11/splits \ + --out-dir data/datasets/flooding + # optional 3-class: add --jrc-dir data/sen1floods11/JRCWaterHand +""" +from __future__ import annotations + +import argparse +import logging +import re +from pathlib import Path + +import numpy as np +import rasterio + +logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)-8s %(message)s") +logger = logging.getLogger("prepare_sen1floods11") + +# Config bands for flooding (0-indexed into the 13-band S2 stack): +# B03 Green = 2, B08 NIR = 7, B11 SWIR-1 = 11 +DEFAULT_BANDS = (2, 7, 11) +STEM_RE = re.compile(r"([A-Za-z\-]+_\d+)") # e.g. "Bolivia_103757", "Sri-Lanka_31559" + + +def chip_stem(name: str) -> str | None: + """Extract the EVENT_CHIPID stem from a filename or CSV token.""" + m = STEM_RE.search(name) + return m.group(1) if m else None + + +def assign_splits(stems: list[str], splits_dir: Path | None) -> dict[str, str]: + """Return {stem: split}. Use official CSVs if found, else deterministic 80/10/10.""" + mapping: dict[str, str] = {} + if splits_dir and splits_dir.exists(): + wanted = {"train": ["train"], "val": ["valid", "val"], "test": ["test"]} + for split, keys in wanted.items(): + for csv in splits_dir.rglob("*.csv"): + if any(k in csv.name.lower() for k in keys): + for line in csv.read_text().splitlines(): + s = chip_stem(line) + if s: + mapping[s] = split + if mapping: + logger.info("Loaded splits from CSVs: %d chips assigned.", len(mapping)) + return mapping + + # Deterministic fallback split + logger.warning("No usable split CSVs found — using deterministic 80/10/10 split.") + for i, s in enumerate(sorted(stems)): + r = i % 10 + mapping[s] = "train" if r < 8 else ("val" if r == 8 else "test") + return mapping + + +def build_mask(label: np.ndarray, jrc: np.ndarray | None) -> np.ndarray: + """Map raw label (-1/0/1) [+ optional JRC permanent water] to class indices 0/1/2.""" + out = np.zeros(label.shape, dtype=np.uint8) # default dry (0); -1 no-data -> 0 + water = label == 1 + if jrc is None: + out[water] = 1 # binary: all water -> class 1 + else: + permanent = jrc == 1 + out[water & permanent] = 1 # permanent_water + out[water & ~permanent] = 2 # flooded + return out + + +def convert_one(stem: str, s2_path: Path, label_path: Path, jrc_path: Path | None, + bands: tuple[int, ...], out_dir: Path) -> bool: + try: + with rasterio.open(s2_path) as src: + img = src.read().astype(np.float32) # (13, H, W) + profile = src.profile + sel = img[list(bands), :, :] # (3, H, W) + + with rasterio.open(label_path) as src: + label = src.read(1) + jrc = None + if jrc_path and jrc_path.exists(): + with rasterio.open(jrc_path) as src: + jrc = src.read(1) + mask = build_mask(label, jrc) + except Exception as exc: + logger.warning("Skip %s (%s)", stem, exc) + return False + + img_out = out_dir / "images" / f"{stem}.tif" + msk_out = out_dir / "masks" / f"{stem}.tif" + img_out.parent.mkdir(parents=True, exist_ok=True) + msk_out.parent.mkdir(parents=True, exist_ok=True) + + prof_img = profile.copy() + prof_img.update(count=len(bands), dtype="float32") + with rasterio.open(img_out, "w", **prof_img) as dst: + dst.write(sel) + + prof_msk = profile.copy() + prof_msk.update(count=1, dtype="uint8", nodata=None) + with rasterio.open(msk_out, "w", **prof_msk) as dst: + dst.write(mask[np.newaxis, :, :]) + return True + + +def main() -> int: + ap = argparse.ArgumentParser(description="Convert Sen1Floods11 to ClimateVision layout.") + ap.add_argument("--s2-dir", required=True) + ap.add_argument("--label-dir", required=True) + ap.add_argument("--splits-dir", default=None) + ap.add_argument("--jrc-dir", default=None, help="optional, enables 3-class output") + ap.add_argument("--out-dir", required=True) + ap.add_argument("--bands", default="2,7,11", help="0-indexed S2 bands (B03,B08,B11)") + args = ap.parse_args() + + s2_dir, label_dir = Path(args.s2_dir), Path(args.label_dir) + jrc_dir = Path(args.jrc_dir) if args.jrc_dir else None + out_root = Path(args.out_dir) + bands = tuple(int(b) for b in args.bands.split(",")) + + s2_files = {chip_stem(p.name): p for p in s2_dir.glob("*.tif") if chip_stem(p.name)} + label_files = {chip_stem(p.name): p for p in label_dir.glob("*.tif") if chip_stem(p.name)} + stems = sorted(set(s2_files) & set(label_files)) + if not stems: + logger.error("No matching S2/Label pairs found. Check --s2-dir and --label-dir.") + return 2 + logger.info("Found %d image/label pairs.", len(stems)) + + splits = assign_splits(stems, Path(args.splits_dir) if args.splits_dir else None) + + counts = {"train": 0, "val": 0, "test": 0} + for stem in stems: + split = splits.get(stem, "train") + jrc_path = (jrc_dir / f"{stem}_JRCWaterHand.tif") if jrc_dir else None + if convert_one(stem, s2_files[stem], label_files[stem], jrc_path, bands, + out_root / split): + counts[split] += 1 + + logger.info("Done. train=%(train)d val=%(val)d test=%(test)d", counts) + logger.info("Output: %s/{train,val,test}/{images,masks}", out_root) + logger.info("Next: python scripts/train_real.py --analysis-type flooding " + "--data-dir %s --epochs 50 --batch-size 8 --out models", out_root) + return 0 if sum(counts.values()) else 1 + + +if __name__ == "__main__": + raise SystemExit(main())