Skip to content

Make the model comparison properly reflect PolicyEngine — audited data + host-model UI#43

Open
MaxGhenis wants to merge 4 commits into
masterfrom
improve-model-comparisons
Open

Make the model comparison properly reflect PolicyEngine — audited data + host-model UI#43
MaxGhenis wants to merge 4 commits into
masterfrom
improve-model-comparisons

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

Improves the open microsimulation reference so it accurately and richly represents PolicyEngine alongside its peers, on both the data and UI sides.

Make the comparison reflect the actual model

Every PolicyEngine row was audited against the model code at v1.722.5 (2026-06-09):

  • Coverage corrections (9 rows): estate tax is partial, not not-implemented (IRC §2001(c) schedule + §2010 unified credit are implemented over an exogenous taxable estate); OASI notes now document the full PIA/AIME benefit engine (88 test cases) alongside the survey-reported microsim input; UI notes the NJ + PA benefit-determination engines; LIHEAP drops a nonexistent federal-module claim in favor of the actual DC/IL/MA/TX(+Riverside County) implementations; CCDF (~21 states), TANF (28 + DC), SSI state supplements (20 states), state EITC (28 + DC), and state CTC (17 + DC) counts now match the code; Section 8 is partial per the model's own metadata.
  • 8 new programs PolicyEngine models end to end that the matrix lacked: school meals, Pell Grant, Head Start, Lifeline, ACP, local income taxes (NYC/Philadelphia/MD counties), AMT, and NIIT — each grounded in variable paths and test counts.
  • Accuracy benchmarks are now real. The placeholder "predicted value pending" rows are replaced with actual model runs against 2024 administrative targets: SNAP +2.3% (USDA FY2024), EITC −3.2% (IRS TY2024), Medicaid enrollment +5.5% (CMS Dec 2024), CTC +11.5% (IRS SOI TY2022), SSI −13%/−22% with the take-up gap stated plainly. Income tax is deliberately excluded — the current enhanced CPS build overstates AGI via an inflated miscellaneous_income imputation (tracked in enhanced_cps_2024 overshoots CBO income_tax target by ~1.86x across 2024-2026 — loss weighting drowns out aggregate targets policyengine-us-data#1107), so publishing it would misrepresent the model.
  • Transparency counts filled: 132/23 named contributors, 21,697/1,084 named test cases (US/UK), with API-backed sources.
  • Usage rows replace blanket unknowns with verified facts: HM Treasury's Algorithmic Transparency Record on its PolicyEngine UK pilot, The Times coverage, the Nuffield grant, a US House press release citing PolicyEngine, Niskanen's CTC report, MyFriendBen + Benefit Navigator API integrations, and the NBER TAXSIM-emulator MoU.
  • Citation fix: the JOSS DOI cited in models/transparency (10.21105/joss.04494) belongs to an unrelated paper; rows now link the actual under-review JOSS submission and keep academicCitations: unknown.
  • Behavioral rows now show both sides: static-by-default stays, and the optional CBO-derived presets (income −0.05, substitution 0.22–0.31 by decile, capital gains −0.79) are documented as off-by-default capabilities.
  • Modeling mechanics added: state/congressional-district and local-authority/constituency calibrated weights, continuous test suite + TAXSIM cross-validation, off-by-default behavioral-response design.

Fairer, deeper peer rows

  • TRIM3 models LIHEAP (Urban overview + ASPE TRIM3 brief — the older boreas list is non-exhaustive) and ACA marketplace subsidies live in Urban's sibling HIPSM model.
  • Independent Census Bureau evaluation benchmarks for TRIM3 (87% of the IRS income-tax target, 73% EITC, 101% CTC) and TAXSIM (88%, 73%) for TY2012; Tax-Calculator's sub-dollar TAXSIM-35 agreement; six UKMOD simulated-vs-official 2023 benchmarks including the documented 37% Housing Benefit shortfall.
  • TPC SNAP/TANF/SSI marked not-implemented with the primary-source quote (TRIM3 tabulations adjust reported amounts; programs are not simulated); Tax-Calculator marked federal-only for state credits; entitledto covers the Scottish Child Payment.

UI: the host model reads as "ours"

  • PolicyEngine column/row gets a teal tint + This model chip across the coverage matrix, validation, behavioral, calibration, pipeline, and About panels.
  • Sticky program column keeps context while scrolling 23-peer matrices.
  • Compare drawer groups peers by sector (Government / Non-profit / Academic / For-profit) with model-type sublabels.
  • Validation rows show percent deviation vs target ("+2.3% vs target", "within 0.1% of target") and format GBP benchmarks.

Tests: 77 passing (11 new), bun run build clean, eslint . clean.

🤖 Generated with Claude Code

MaxGhenis and others added 4 commits June 9, 2026 23:58
Comparison tables now distinguish 'this model' from peers: the
PolicyEngine column/row gets a teal tint and a 'This model' chip in the
coverage matrix, validation benchmarks, behavioral tables, calibration,
pipeline, and About panels. The coverage matrix pins the program column
while scrolling horizontally, the compare drawer groups peers by sector
with model-type sublabels, and validation rows show percent deviation
from the administrative target (with GBP formatting for UK benchmarks).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Corrects nine PolicyEngine US rows that no longer matched the code:
estate tax is partial (the IRC 2001(c) schedule and unified credit are
implemented over an exogenous taxable estate), OASI documents the full
PIA/AIME benefit engine alongside the survey-reported microsim input,
UI notes the NJ and PA benefit engines, LIHEAP corrects a nonexistent
federal-module claim to the DC/IL/MA/TX(+Riverside) state programs,
CCDF updates to the ~21-state ruleset, TANF quantifies 28 states + DC,
SSI state supplements list all 20 modeled states, state EITC/CTC counts
match the code (28+DC, 17+DC), and Section 8 is partial per the model's
own metadata (AMI inputs cover selected geographies).

Adds eight programs PolicyEngine models end to end that the matrix
lacked — school meals, Pell Grant, Head Start, Lifeline, ACP, local
income taxes (NYC/Philadelphia/MD), AMT, and NIIT — with statute
citations and coverage rows grounded in variable paths and test counts.

Also resolves peer cells: TRIM3 models LIHEAP (Urban overview + ASPE
TRIM3 brief; the older boreas list is non-exhaustive) and ACA subsidies
live in Urban's sibling HIPSM model rather than TRIM3; TPC treats
SNAP/TANF/SSI as TRIM3-adjusted data inputs, not simulated programs;
Tax-Calculator is federal-only (no state EITC/CTC); and entitledto
includes the Scottish Child Payment.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
PolicyEngine US rows replace 'predicted value pending' placeholders
with actual model runs (2026-06-09, versions noted per row) against
2024 administrative targets: SNAP +2.3% vs USDA FY2024, EITC -3.2% vs
IRS TY2024, Medicaid enrollment +5.5% vs CMS December 2024, CTC +11.5%
vs the latest complete IRS SOI total, and SSI -13%/-22% with the
documented take-up gap stated plainly. Income tax is deliberately
excluded: the current enhanced CPS build overstates AGI via an inflated
miscellaneous_income imputation (tracked upstream in
policyengine-us-data#1107), and publishing that number would
misrepresent the model.

Peer benchmarks give the page independent grounding: the Census
Bureau's evaluation of TRIM3 (87% of the IRS income-tax target, 73% of
EITC, 101% of CTC) and TAXSIM (88%, 73%) for TY2012, Tax-Calculator's
sub-dollar agreement with TAXSIM-35, and six UKMOD simulated-vs-official
2023 benchmarks from the CeMPA country report, including the documented
37% Housing Benefit shortfall.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e facts

Usage rows replace blanket unknowns with verified, sourced facts: HM
Treasury's Algorithmic Transparency Record documenting its PolicyEngine
UK pilot (with The Times coverage), the Nuffield Foundation grant, a
US House press release citing PolicyEngine estimates, the Niskanen
Center CTC report, MyFriendBen and Benefit Navigator API integrations,
the NBER TAXSIM-emulator MoU, and exact GitHub/PyPI counts retrieved
2026-06-09.

Transparency rows fill contributor counts (132 US / 23 UK) and test
counts (21,697 / 1,084 named cases) with API-backed sources. The JOSS
citation is corrected everywhere: DOI 10.21105/joss.04494 belongs to an
unrelated paper; PolicyEngine's JOSS submission is under review, so the
rows now link the actual review thread and academicCitations stays
unknown.

Behavioral rows add the optional CBO-derived elasticity presets (income
-0.05, substitution 0.22-0.31 by decile, capital gains -0.79) clearly
marked off-by-default, complementing the existing static-default rows.
Modeling mechanics add state/congressional-district and local-authority/
constituency calibrated weights, the continuous test suite + TAXSIM
cross-validation, and the off-by-default behavioral-response design.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-model Ready Ready Preview, Comment Jun 9, 2026 10:01pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant