Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion reports/technical_risk_register.md
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,14 @@ Tier 2: structural fragility under the realistic change of wiring to global, wit

`UNFAOPostProcessorManager` subclasses **two concrete** pipeline-core base classes (`PostprocessorManager`, `ForecastingModelManager`) and **interleaves infrastructure** (env reading, `AppwriteConfig` construction, `DatastoreModule`, path resolution) with the FAO **business logic** (GAUL enrichment, the 9-column null gate) inside the lifecycle hooks. Consequences: (a) the FAO logic cannot be instantiated or unit-tested without the full framework + Appwrite env + viewser; (b) **pandas cannot leave the delivery path** because the inherited data loader and `PGMDataset` are pandas — gated on pipeline-core's own DataFrame retirement; (c) **SDP exposure** — heavy *inheritance* coupling to a pipeline-core that is itself unstable (mid-migration), so upstream changes break far from their cause (cf. C-27, C-29); (d) it's the repo's only composition-over-inheritance violation. The dependency itself is correct (`unfao.py` genuinely *is* a pipeline-core postprocessor) — the issue is its **blast radius**. Mitigation (does **not** fight the Template-Method framework): keep the subclass as a **thin shell** but extract `enrich` + `validate` + the 9-column contract into a pipeline-core-free core object the manager *calls*, and wrap the Appwrite I/O behind a small delivery-sink adapter (DIP). This makes the FAO logic testable standalone and insulates it from pipeline-core churn.

See also C-07/C-27/C-29 (pipeline-core coupling symptoms), C-39 (the dead-mapper cleanup that precedes any unfao restructuring).
**Priority raised by the samples/uncertainty requirement (2026-06-27).** The delivery is moving from point estimates to **predictions-with-uncertainty (S samples per cell)**. This makes the pandas gate (consequence b) materially worse, not just cosmetic:
- **views-frames** stores a distribution as a native contiguous `(N, S)` float32 array (sample axis always explicit; point = S=1). **pandas `PGMDataset`** stores it as **object-dtype list-in-cell** — each of N cells holds a separate length-S numpy array (`_ViewsDataset._convert_to_arrays` / `_check_prediction_samples` in pipeline-core `data/handlers.py`).
- Cost of the object-dtype representation scales ~linearly with S: memory blow-up (this is pipeline-core's own OOM, **#181/#189** — "~18 GB, kills runs" off list-in-cell DataFrames), an encode/decode tax at every parquet/API boundary (faoapi's inverse `_convert_to_arrays`), and a silent `np.resize` pad on mismatched sample counts (feature path).
- At S=1 the penalty is invisible; at S≈1000 it dominates → the uncertainty work is the strongest driver for the frame migration.

**Concrete pipeline-core gate (what "DataFrame retirement" actually requires).** vpp inherits three *concrete* pandas pieces (the abstract `PostprocessorManager` base is fine): the input loader (`ViewsDataLoader.get_data` → parquet→pandas), the container (`PGMDataset`/`_ViewsDataset`, object-dtype cells), and the prediction-store parquet I/O. Closing the gate = pipeline-core **Epic #186 + #207** landing: frame-native input loader (**#161**, gated on the datafactory↔core output contract **#162/#136**), a frame container replacing/re-backing `PGMDataset` (**#159**), frame/arrow store I/O, and retiring the legacy pandas report path (**#211**, the #181 OOM source). It is an epic, not a PR. The half **vpp owns and can do now** (unblocked): the thin-shell de-inheritance above — extract enrich/validate into a pipeline-core-free core the manager *calls*, so the eventual frame swap is a one-seam change. Dependency direction confirmed: pipeline-core does **not** import vpp; vpp **is-a** pipeline-core postprocessor (Template-Method subclass), so it inherits pipeline-core's representation rather than choosing its own.

See also C-07/C-27/C-29 (pipeline-core coupling symptoms), C-39 (the dead-mapper cleanup that precedes any unfao restructuring), and **#45** (the delivery-side draw carrier — ship `(N, S)` uncollapsed as a native frame, the producer half of this same problem).

---

Expand Down
Loading