Calibrate LA council tax (band counts + net £) and fix national gross/net#374
Calibrate LA council tax (band counts + net £) and fix national gross/net#374vahid-ahmadi wants to merge 10 commits intomainfrom
Conversation
Two families of LA-level targets, covering all 360 LAs in
local_authorities_2021.csv, built from four public sources:
- `ons/council_tax_band_d/{code}` (350 targets): average Band D
council tax inclusive of all precepts per billing authority.
Sources: MHCLG *Council Tax levels set by local authorities in
England 2026-27*, Welsh Government *Council Tax levels April 2026
to March 2027*, Scottish Government *Council Tax Assumptions 2025*.
All 296 English + 22 Welsh + 32 Scottish LAs covered.
- `ons/council_tax_band_count/{code}/{band}` (2,541 targets): number
of dwellings per band A-H per LA. Source: VOA *Council Tax: Stock
of Properties, 2025*. Covers England + Wales (318 LAs × ~8 bands,
minus City of London Band A which is VOA-suppressed).
NI is excluded: domestic rates, not council tax. Scotland band
counts are not in VOA; Scottish Assessors publishes them separately
and is a follow-up.
Files
-----
- `storage/la_council_tax.csv` (31 KB, 360 rows): canonical CSV
joining DLUHC Table 10 column 17, Welsh Table 1 "Overall average
band D", Scottish Gov "CT by Band 2025-26" Band D column, and VOA
CTSOP1.0 bands A-H onto the reference LA list.
- Post-2023 South Yorkshire E-codes (E08000038/39) re-mapped to
pre-2023 codes (E08000016/19) to match the reference list.
- Scottish ampersand/double-space naming normalised
("Argyll & Bute" → "Argyll and Bute", etc.).
- `targets/sources/la_council_tax.py`: reads the CSV, emits Target
objects at geographic_level=LOCAL_AUTHORITY with per-country year
tagging and per-country reference URL.
Testing
-------
22 hermetic tests (no network access, no baseline fixture needed):
Structure
- Row count matches local_authorities_2021.csv.
- Every expected column present.
- Four UK country codes represented.
- Every LA code matches the reference list.
Value plausibility (the #371 lesson)
- Band D amount in [£900, £3,500] for every row with a value.
- Total dwellings in [200, 800,000] for every row with a value.
- Explicit Isles of Scilly regression test: total dwellings in
[500, 5,000], not the 2.49M outlier that slipped into #371.
- Band A-H counts sum to total dwellings within 20-property slack
(VOA 10-property suppression allowance).
- Every band-count target value ≤ 500k (largest LA stock).
Coverage expectations
- Every English, Welsh and Scottish LA has a Band D value.
- Northern Ireland has no council tax flagged (has_council_tax=False).
Spot-checks of published facts
- Wandsworth (E09000032) and Westminster (E09000033) are the two
lowest-Band-D English LAs (catches row-swap bugs).
- Scottish average Band D is £500+ below English average.
Target-API invariants
- get_targets() returns a non-empty list without network access.
- Band D target count matches the CSV's non-null Band D count.
- Band count target count matches Σ non-null band columns.
- Every target carries geographic_level=LOCAL_AUTHORITY and a
geo_code.
- Band D targets use Unit.GBP; band count targets use Unit.COUNT
with is_count=True.
- Every target has at least one year of values.
Sources
-------
- MHCLG (England 2026-27):
https://www.gov.uk/government/statistics/council-tax-levels-set-by-local-authorities-in-england-2026-to-2027
- Welsh Government (Wales 2026-27):
https://www.gov.wales/council-tax-levels-april-2026-march-2027-html
- Scottish Government (Scotland 2025-26):
https://www.gov.scot/publications/council-tax-datasets/
- VOA (England + Wales 2025):
https://www.gov.uk/government/statistics/council-tax-stock-of-properties-2025
Out of scope for this PR (follow-ups)
-------------------------------------
- Wiring these targets into
datasets/local_areas/local_authorities/loss.py so the LA
reweighting actually calibrates on them. Planned follow-up PR.
- Scottish Assessors per-LA chargeable-dwellings to fill the Scotland
band-count gap.
- Council Tax Support caseload per LA (DWP StatXplore).
- Single Person Discount rate per LA (CIPFA).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review points addressed:
- Add count_band_I column to la_council_tax.csv, populated for all 22
Welsh LAs (Wales revalued in 2005 and introduced a 9th band). Cardiff
1480, Monmouthshire 670, Vale of Glamorgan 1060, etc. English rows
keep Band I null; VOA marks it [z] (not applicable).
- Re-source total_dwellings from VOA "All properties" column instead
of deriving it as the sum of A-H. Previously Σ(A..H) was used for
both sides of test_band_counts_sum_to_total, making the test
self-referential; now it validates against the published total with
a 20-property slack for VOA rounding.
- Rename count columns symmetrically: band_A..band_H + band_D_count →
count_band_A..count_band_I. Removes the lopsided band_D_count name
that existed only to avoid clashing with band_d_amount.
- Align band-count target names with voa_council_tax.py:
voa/council_tax/{code}/{band} (was ons/council_tax_band_count/...);
variable="council_tax_band" (was council_tax_band_count, which is
not a real PolicyEngine-UK variable); drop breakdown_variable to
match the regional VOA module.
- Cache the CSV read with @lru_cache(maxsize=1), matching voa_council_tax.
- Update module docstring: "A-H in England/Scotland, A-I in Wales".
Tests:
- New: test_welsh_las_have_band_i (all 22 Welsh LAs populated).
- New: test_english_las_have_no_band_i (guard against spurious fills).
- New: test_cardiff_band_i_matches_published_figure (~1,480 per VOA 2025).
Final target counts:
- 350 Band D amount targets (unchanged).
- 2,563 band-count targets, up from 2,541: +22 Welsh Band I plus two
band-H rows that were null due to the earlier truncation.
The targets registered in la_council_tax.py were inert — the LA target
matrix had no columns for them, so the reweighter could not see them.
This wires the eight VOA Council Tax Stock-of-Properties band-count
targets (A-H) into the LA loss matrix:
- matrix entry: per-household indicator 1[council_tax_band == B] from
policyengine-uk.
- y entry: 360-vector of per-LA dwelling counts from
storage/la_council_tax.csv. For LAs without VOA data — Scottish LAs
(the VOA summary tables don't cover Scotland) and Northern Irish LAs
(no council tax) — the value falls back to
national_count × la_household_share, matching the existing tenure
block's fallback pattern.
Two targets are deliberately not wired in this pass:
- Band I — Wales-only and mostly null in the CSV.
- The Band D £ amount (ons/council_tax_band_d/{code}) — a per-rate
quantity that does not fit the linear matrix-times-weights
aggregation. Wiring it as total council-tax revenue would need
Scotland-specific band ratios (different from England/Wales after
2017) and is worth a separate PR.
New tests in test_la_loss_council_tax.py cover both layers:
- Light: CSV joins to every LA code, the eight count_band_{X} columns
exist, E/W rows are populated, Scotland is null as documented, and
NI has has_council_tax=False.
- Full build (gated on enhanced FRS fixture): all eight columns present
in matrix and y; y vectors length 360, finite and positive; matrix
entries are 0/1 indicators with rows summing to ≤1; y matches the
CSV verbatim for an English LA (Hartlepool); Scotland and NI LAs
receive a positive fallback rather than NaN or zero.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the second FRS data point into the LA reweighter, addressing
the 28 Apr standup ALIGNED decision: "calibrate the two FRS data
points as the council tax information is provided after deductions."
Both sides of the new constraint are net of CTR:
- matrix col = council_tax_less_benefit (gross − CTR benefit)
- y = directly observed net council tax requirement per LA
Sources (no national-total apportionment, all directly published):
- England (296 LAs): MHCLG Council Taxbase 2025, Table 1.35 "Tax base
after allowance for council tax support" × Band D amount.
Sums to £47.4bn, within 3.4% of the MHCLG Table 1 published England
Council Tax Requirement of £45.86bn (small gap from year mismatch:
2025 taxbase × 2026-27 Band D).
- Wales (22 LAs): Welsh Government "Council Tax Levels April 2026
to March 2027" Table 3 "Council tax income (£m)". Sums to £2.45bn.
- Scotland (32) and NI (10): no source wired; loss.py routes through
the existing national × la_household_share fallback, same pattern
as the band-count target and the rent target.
Mirrors the rent block in loss.py: load CSV → merge into ct_merged →
matrix col / y assignment / has_data mask / national-share fallback.
Files:
- storage/la_council_tax.csv: new column total_council_tax_net.
- targets/sources/la_council_tax.py: load_la_net_council_tax() +
Target objects named housing/council_tax_net/{code}.
- datasets/local_areas/local_authorities/loss.py: housing/council_tax_net
block immediately after the band-count block.
- tests/test_la_loss_council_tax.py: 11 new tests (4 layer-1 +
7 layer-2) covering CSV column presence, country coverage, value
range, England-total ballpark vs MHCLG, matrix-col correctness,
na-fallback behaviour, calibratability sanity check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Edits: 1. Both FRS data points now calibrated at LA level
2. Net of CTR alignment on both sides
3. Net receipts data investigated and used directly (no derivation)
4. Same shape as the existing rent / tenure / One thing I flagged but did not fix here: at the national level, the OBR |
OBR EFO Table 4.1 reports "Total net council tax receipts" — net of council tax reduction (CTR). The matching household-level signal is council_tax_less_benefit (= gross council tax − CTR award), not council_tax (which is the gross liability before CTR per its docstring "Gross amount spent on Council Tax, before discounts"). Calibrating gross household values against a net national target systematically pulls weights down to fit (Σ w × gross > Σ w × net), leaking bias into adjacent national targets that share the weight vector. Order-of-magnitude sanity (UK 2024-25): Σ w × council_tax (gross) ≈ £55bn Σ w × council_tax_less_benefit (net) ≈ £47bn OBR Table 4.1 "Total net council tax" ≈ £44bn After the fix, the council tax constraint is internally consistent (both sides net) and aligns with Max's 28 Apr standup decision on FRS-net-of-CTR alignment. Pairs naturally with the LA-level housing/council_tax_net target this PR adds — both use the same net variable. Adds three regression tests pinning the net-variable contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Folded the national fix into this PR (commit
Order-of-magnitude check:
Three regression tests added ( Effect: closes the "[Vahid Ahmadi] Review Data" loop. National council tax calibration is now consistent with the FRS-net framing, and pairs naturally with the LA-level |
Northern Ireland uses domestic rates, not council tax. The CSV's has_council_tax flag has been False for NI from the original commit, but loss.py was ignoring it and assigning national × la_household_share to NI LAs for both band counts and the new net £ column. Effect: the optimiser was being told "NI households should pay this much council tax" with a positive target, while every NI household has council_tax_band == None and council_tax_less_benefit == 0 — an unsatisfiable constraint that wastes loss the optimiser cannot drive to zero. Reported by @MaxGhenis in PR review. Fix: read has_council_tax from the CSV, gate the np.where so NI LAs get y == 0 for all 9 council-tax columns. Direct-value and fallback paths unchanged for E/W/S. Updates two tests that previously asserted positive fallback for NI; adds explicit zero-NI assertion for housing/council_tax_net. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per @MaxGhenis PR review: both council-tax LA targets are derived proxies, not direct matches for the matrix-side variables. The PR description and code comments earlier overstated this. voa/council_tax/{A..H}: target counts VOA dwellings (E&W only, includes exempt/empty/second homes); matrix counts policyengine-uk households. Banding ratios differ in Scotland post-2017 and Wales has Band I. housing/council_tax_net: target value is MHCLG taxbase × Band D (taxbase = Band D equivalent dwellings adjusted for ~7 discount/ premium/exemption classes); matrix col is FRS-reported council_tax_less_benefit (household-reported gross less reported CTB). Same intent, different construction paths. Documentation only — no code, data, or test behaviour change. The la_council_tax.py docstring now has an explicit "Lineage caveats" section, and loss.py block comments label both targets as derived/proxy with cross-reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Follow-up on the direct-target discussion: I pushed 96f5707 to stop fabricating local council-tax targets where no direct source cell exists. Missing CT band/net cells now stay NaN and calibrate_local_areas masks NaN local target cells out of the loss, rather than filling Scotland/NI with national-share values or hard zeroes. This preserves direct source cells and avoids treating NI zeroes as training constraints when the matrix side may not be zero.\n\nThis also adds a toy calibrator regression for sparse local targets. Remaining source-cleanup direction: prefer direct council-tax requirement/income where available, and add Scotland band/net sources before training on those LA cells.\n\nVerification: uv run pytest policyengine_uk_data/tests/test_calibrate_save.py policyengine_uk_data/tests/test_la_council_tax_targets.py policyengine_uk_data/tests/test_la_loss_council_tax.py policyengine_uk_data/tests/test_obr_council_tax.py -q; ruff check/format on touched files. |
|
@MaxGhenis — nit on
Also Not blocking. |
|
Addressed in After the NaN-masking change, direct source-cell availability is enough: band targets train only where Verification:
|
What this PR does
Calibrates both FRS council tax data points at LA level, addressing the 28 Apr standup ALIGNED decision ("the model will calibrate the two FRS data points as the council tax information is provided after deductions").
Three new column families added to
datasets/local_areas/local_authorities/loss.py:1[council_tax_band == B]for B in A–Hvoa/council_tax/{A..H}(8 cols)council_tax_less_benefit(= gross − CTR benefit)housing/council_tax_netPlus a national-level fix:
targets/compute/council_tax.pynow usescouncil_tax_less_benefitinstead ofcouncil_taxfor the OBRobr/council_taxtarget (which is "Total net council tax receipts"). Both sides of that constraint are now net of CTR.LAs missing either input have their
ycell left asNaN; the calibrator (utils/calibrate.py) masks NaN cells out of the loss so missing-source LAs don't contribute to training on those targets. No fabricated national-share fallbacks; no hard-zero NI targets — only directly observed cells train the calibrator (commit96f5707).Lineage caveats (flagged in review by @MaxGhenis)
Both council-tax LA targets are derived/proxy targets — products / rescalings of observed inputs, not directly observed LA totals. The PR earlier described them as direct; that was overstated.
Band counts (
voa/council_tax/{A..H}):council_tax_band.Net £ paid (
housing/council_tax_net):council_tax_less_benefit(household-reported gross less reported CTB).A separate policy question — whether derived/proxy targets like these should sit at full training weight alongside directly observed targets (HMRC SPI, ONS pop, DWP UC, VOA dwellings) — is being tracked in #381 and is not blocking this PR.
Closes #370.
Sources
Sanity check. Across the 318 LAs with a directly observed net £ figure, the per-LA targets sum to ~£49.9bn (England £47.4bn + Wales £2.45bn). This roughly reconciles with MHCLG's published England-only Council Tax Requirement of £45.86bn (small gap from year mismatch — 2025 taxbase × 2026-27 Band D). The reconciliation is for sanity only; the calibration constraint operates per-LA, not on the national aggregate.
Files
New / modified
policyengine_uk_data/storage/la_council_tax.csv— addstotal_council_tax_netcolumn for E/W LAs.policyengine_uk_data/targets/sources/la_council_tax.py—load_la_net_council_tax()helper +Targetobjects namedhousing/council_tax_net/{code}. Module docstring documents derived/proxy nature + lineage caveats.policyengine_uk_data/targets/compute/council_tax.py— switches OBR national matrix col fromcouncil_tax(gross) tocouncil_tax_less_benefit(net) so both sides of the constraint are net of CTR.policyengine_uk_data/datasets/local_areas/local_authorities/loss.py—housing/council_tax_netblock immediately after the band-count block. Both blocks leave missing-source cells asNaN; the calibrator masks them.policyengine_uk_data/utils/calibrate.py— calibrator updated to mask NaN cells out of the per-LA loss so sparse local targets are first-class.policyengine_uk_data/tests/test_la_loss_council_tax.py— layer-1 CSV/coverage tests + layer-2 FRS-fixture-gated wiring + calibratability + NaN-masking assertions.policyengine_uk_data/tests/test_calibrate_save.py— toy calibrator regression locking in the NaN-masking contract.policyengine_uk_data/tests/test_obr_council_tax.py— 3 tests pinning the net-variable contract for the national OBR target.Tests
total_council_tax_netcolumn present, England + Wales fully covered, Scotland + NI absent (masked territory), value range £2m–£1.5bn (lower bound for Isles of Scilly), covered total in £43–50bn ballpark.housing/council_tax_netcolumn present in matrix and y, matrix col equalssim.calculate("council_tax_less_benefit"), direct cells finite, English LA y matches CSV exactly, Scotland + NI cells are NaN (masked, not fabricated), covered-LA target sum within 0.3–3× of weighted initial covered-LA net CT (calibratability sanity).compute_obr_council_taxreturnscouncil_tax_less_benefitnotcouncil_tax; country masks apply correctly; gross variable not queried.Full run incl. adjacent suites: no regressions.
Out of scope (follow-ups)
Related
c330b44).