Epic: #85 · S4 · [UNILATERAL — low priority]
Background
GaulLookupEnricher is OWN-CHOICE pandas: it loads the GAUL lookup via pd.read_parquet (enrichment.py:48) and attaches metadata with base.merge(left_on=pg_id_col, right_index=True, how="left") (enrichment.py:117-119). This is a keyed metadata-attach join, not frame algebra. It does not block samples (it attaches geo columns to whatever rows exist), so this is low priority.
Work
- Replace
pd.read_parquet(lookup) + .merge with a pandas-free keyed gather: load the lookup once into a priogrid_gid → row numpy/dict structure (via pyarrow.parquet, already imported for _read_version), then gather metadata by cell id, producing NaN/sentinel for misses to preserve the fail-loud null behaviour.
- Explicitly do NOT push this into views-frames —
PredictionFrame carries values + a SpatioTemporalIndex, not arbitrary GAUL columns; modelling a 9-column geographic attach as a frame op would distort the value-object contract. The target is plain numpy/arrow keyed gather.
Acceptance criteria
Parity / validation
tests/test_enrichment.py + tests/test_append_metadata.py as the oracle: feed identical inputs to old merge vs new gather, assert equality incl. NaN positions and dtypes.
Dependencies
Independent of S1–S3 (can land any time). Keep gaul_schema.py (the 9-column contract) as the single source of truth; coordinate with S5.
Files
views_postprocessing/unfao/enrichment.py, tests/test_enrichment.py, tests/test_append_metadata.py.
Epic: #85 · S4 ·
[UNILATERAL — low priority]Background
GaulLookupEnricheris OWN-CHOICE pandas: it loads the GAUL lookup viapd.read_parquet(enrichment.py:48) and attaches metadata withbase.merge(left_on=pg_id_col, right_index=True, how="left")(enrichment.py:117-119). This is a keyed metadata-attach join, not frame algebra. It does not block samples (it attaches geo columns to whatever rows exist), so this is low priority.Work
pd.read_parquet(lookup)+.mergewith a pandas-free keyed gather: load the lookup once into apriogrid_gid → rownumpy/dict structure (viapyarrow.parquet, already imported for_read_version), then gather metadata by cell id, producing NaN/sentinel for misses to preserve the fail-loud null behaviour.PredictionFramecarries values + aSpatioTemporalIndex, not arbitrary GAUL columns; modelling a 9-column geographic attach as a frame op would distort the value-object contract. The target is plain numpy/arrow keyed gather.Acceptance criteria
country_iso_a3.isna()warning path fires identically.Parity / validation
tests/test_enrichment.py+tests/test_append_metadata.pyas the oracle: feed identical inputs to old merge vs new gather, assert equality incl. NaN positions and dtypes.Dependencies
Independent of S1–S3 (can land any time). Keep
gaul_schema.py(the 9-column contract) as the single source of truth; coordinate with S5.Files
views_postprocessing/unfao/enrichment.py,tests/test_enrichment.py,tests/test_append_metadata.py.