diff --git a/reports/technical_risk_register.md b/reports/technical_risk_register.md index 1113a1b..e112763 100644 --- a/reports/technical_risk_register.md +++ b/reports/technical_risk_register.md @@ -434,7 +434,14 @@ Tier 2: structural fragility under the realistic change of wiring to global, wit `UNFAOPostProcessorManager` subclasses **two concrete** pipeline-core base classes (`PostprocessorManager`, `ForecastingModelManager`) and **interleaves infrastructure** (env reading, `AppwriteConfig` construction, `DatastoreModule`, path resolution) with the FAO **business logic** (GAUL enrichment, the 9-column null gate) inside the lifecycle hooks. Consequences: (a) the FAO logic cannot be instantiated or unit-tested without the full framework + Appwrite env + viewser; (b) **pandas cannot leave the delivery path** because the inherited data loader and `PGMDataset` are pandas — gated on pipeline-core's own DataFrame retirement; (c) **SDP exposure** — heavy *inheritance* coupling to a pipeline-core that is itself unstable (mid-migration), so upstream changes break far from their cause (cf. C-27, C-29); (d) it's the repo's only composition-over-inheritance violation. The dependency itself is correct (`unfao.py` genuinely *is* a pipeline-core postprocessor) — the issue is its **blast radius**. Mitigation (does **not** fight the Template-Method framework): keep the subclass as a **thin shell** but extract `enrich` + `validate` + the 9-column contract into a pipeline-core-free core object the manager *calls*, and wrap the Appwrite I/O behind a small delivery-sink adapter (DIP). This makes the FAO logic testable standalone and insulates it from pipeline-core churn. -See also C-07/C-27/C-29 (pipeline-core coupling symptoms), C-39 (the dead-mapper cleanup that precedes any unfao restructuring). +**Priority raised by the samples/uncertainty requirement (2026-06-27).** The delivery is moving from point estimates to **predictions-with-uncertainty (S samples per cell)**. This makes the pandas gate (consequence b) materially worse, not just cosmetic: +- **views-frames** stores a distribution as a native contiguous `(N, S)` float32 array (sample axis always explicit; point = S=1). **pandas `PGMDataset`** stores it as **object-dtype list-in-cell** — each of N cells holds a separate length-S numpy array (`_ViewsDataset._convert_to_arrays` / `_check_prediction_samples` in pipeline-core `data/handlers.py`). +- Cost of the object-dtype representation scales ~linearly with S: memory blow-up (this is pipeline-core's own OOM, **#181/#189** — "~18 GB, kills runs" off list-in-cell DataFrames), an encode/decode tax at every parquet/API boundary (faoapi's inverse `_convert_to_arrays`), and a silent `np.resize` pad on mismatched sample counts (feature path). +- At S=1 the penalty is invisible; at S≈1000 it dominates → the uncertainty work is the strongest driver for the frame migration. + +**Concrete pipeline-core gate (what "DataFrame retirement" actually requires).** vpp inherits three *concrete* pandas pieces (the abstract `PostprocessorManager` base is fine): the input loader (`ViewsDataLoader.get_data` → parquet→pandas), the container (`PGMDataset`/`_ViewsDataset`, object-dtype cells), and the prediction-store parquet I/O. Closing the gate = pipeline-core **Epic #186 + #207** landing: frame-native input loader (**#161**, gated on the datafactory↔core output contract **#162/#136**), a frame container replacing/re-backing `PGMDataset` (**#159**), frame/arrow store I/O, and retiring the legacy pandas report path (**#211**, the #181 OOM source). It is an epic, not a PR. The half **vpp owns and can do now** (unblocked): the thin-shell de-inheritance above — extract enrich/validate into a pipeline-core-free core the manager *calls*, so the eventual frame swap is a one-seam change. Dependency direction confirmed: pipeline-core does **not** import vpp; vpp **is-a** pipeline-core postprocessor (Template-Method subclass), so it inherits pipeline-core's representation rather than choosing its own. + +See also C-07/C-27/C-29 (pipeline-core coupling symptoms), C-39 (the dead-mapper cleanup that precedes any unfao restructuring), and **#45** (the delivery-side draw carrier — ship `(N, S)` uncollapsed as a native frame, the producer half of this same problem). ---