Make the model comparison properly reflect PolicyEngine — audited data + host-model UI by MaxGhenis · Pull Request #43 · PolicyEngine/policyengine-model

MaxGhenis · 2026-06-09T22:01:02Z

Improves the open microsimulation reference so it accurately and richly represents PolicyEngine alongside its peers, on both the data and UI sides.

Make the comparison reflect the actual model

Every PolicyEngine row was audited against the model code at v1.722.5 (2026-06-09):

Coverage corrections (9 rows): estate tax is partial, not not-implemented (IRC §2001(c) schedule + §2010 unified credit are implemented over an exogenous taxable estate); OASI notes now document the full PIA/AIME benefit engine (88 test cases) alongside the survey-reported microsim input; UI notes the NJ + PA benefit-determination engines; LIHEAP drops a nonexistent federal-module claim in favor of the actual DC/IL/MA/TX(+Riverside County) implementations; CCDF (~21 states), TANF (28 + DC), SSI state supplements (20 states), state EITC (28 + DC), and state CTC (17 + DC) counts now match the code; Section 8 is partial per the model's own metadata.
8 new programs PolicyEngine models end to end that the matrix lacked: school meals, Pell Grant, Head Start, Lifeline, ACP, local income taxes (NYC/Philadelphia/MD counties), AMT, and NIIT — each grounded in variable paths and test counts.
Accuracy benchmarks are now real. The placeholder "predicted value pending" rows are replaced with actual model runs against 2024 administrative targets: SNAP +2.3% (USDA FY2024), EITC −3.2% (IRS TY2024), Medicaid enrollment +5.5% (CMS Dec 2024), CTC +11.5% (IRS SOI TY2022), SSI −13%/−22% with the take-up gap stated plainly. Income tax is deliberately excluded — the current enhanced CPS build overstates AGI via an inflated miscellaneous_income imputation (tracked in enhanced_cps_2024 overshoots CBO income_tax target by ~1.86x across 2024-2026 — loss weighting drowns out aggregate targets policyengine-us-data#1107), so publishing it would misrepresent the model.
Transparency counts filled: 132/23 named contributors, 21,697/1,084 named test cases (US/UK), with API-backed sources.
Usage rows replace blanket unknowns with verified facts: HM Treasury's Algorithmic Transparency Record on its PolicyEngine UK pilot, The Times coverage, the Nuffield grant, a US House press release citing PolicyEngine, Niskanen's CTC report, MyFriendBen + Benefit Navigator API integrations, and the NBER TAXSIM-emulator MoU.
Citation fix: the JOSS DOI cited in models/transparency (10.21105/joss.04494) belongs to an unrelated paper; rows now link the actual under-review JOSS submission and keep academicCitations: unknown.
Behavioral rows now show both sides: static-by-default stays, and the optional CBO-derived presets (income −0.05, substitution 0.22–0.31 by decile, capital gains −0.79) are documented as off-by-default capabilities.
Modeling mechanics added: state/congressional-district and local-authority/constituency calibrated weights, continuous test suite + TAXSIM cross-validation, off-by-default behavioral-response design.

Fairer, deeper peer rows

TRIM3 models LIHEAP (Urban overview + ASPE TRIM3 brief — the older boreas list is non-exhaustive) and ACA marketplace subsidies live in Urban's sibling HIPSM model.
Independent Census Bureau evaluation benchmarks for TRIM3 (87% of the IRS income-tax target, 73% EITC, 101% CTC) and TAXSIM (88%, 73%) for TY2012; Tax-Calculator's sub-dollar TAXSIM-35 agreement; six UKMOD simulated-vs-official 2023 benchmarks including the documented 37% Housing Benefit shortfall.
TPC SNAP/TANF/SSI marked not-implemented with the primary-source quote (TRIM3 tabulations adjust reported amounts; programs are not simulated); Tax-Calculator marked federal-only for state credits; entitledto covers the Scottish Child Payment.

UI: the host model reads as "ours"

PolicyEngine column/row gets a teal tint + This model chip across the coverage matrix, validation, behavioral, calibration, pipeline, and About panels.
Sticky program column keeps context while scrolling 23-peer matrices.
Compare drawer groups peers by sector (Government / Non-profit / Academic / For-profit) with model-type sublabels.
Validation rows show percent deviation vs target ("+2.3% vs target", "within 0.1% of target") and format GBP benchmarks.

Tests: 77 passing (11 new), bun run build clean, eslint . clean.

🤖 Generated with Claude Code

Comparison tables now distinguish 'this model' from peers: the PolicyEngine column/row gets a teal tint and a 'This model' chip in the coverage matrix, validation benchmarks, behavioral tables, calibration, pipeline, and About panels. The coverage matrix pins the program column while scrolling horizontally, the compare drawer groups peers by sector with model-type sublabels, and validation rows show percent deviation from the administrative target (with GBP formatting for UK benchmarks). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Corrects nine PolicyEngine US rows that no longer matched the code: estate tax is partial (the IRC 2001(c) schedule and unified credit are implemented over an exogenous taxable estate), OASI documents the full PIA/AIME benefit engine alongside the survey-reported microsim input, UI notes the NJ and PA benefit engines, LIHEAP corrects a nonexistent federal-module claim to the DC/IL/MA/TX(+Riverside) state programs, CCDF updates to the ~21-state ruleset, TANF quantifies 28 states + DC, SSI state supplements list all 20 modeled states, state EITC/CTC counts match the code (28+DC, 17+DC), and Section 8 is partial per the model's own metadata (AMI inputs cover selected geographies). Adds eight programs PolicyEngine models end to end that the matrix lacked — school meals, Pell Grant, Head Start, Lifeline, ACP, local income taxes (NYC/Philadelphia/MD), AMT, and NIIT — with statute citations and coverage rows grounded in variable paths and test counts. Also resolves peer cells: TRIM3 models LIHEAP (Urban overview + ASPE TRIM3 brief; the older boreas list is non-exhaustive) and ACA subsidies live in Urban's sibling HIPSM model rather than TRIM3; TPC treats SNAP/TANF/SSI as TRIM3-adjusted data inputs, not simulated programs; Tax-Calculator is federal-only (no state EITC/CTC); and entitledto includes the Scottish Child Payment. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

PolicyEngine US rows replace 'predicted value pending' placeholders with actual model runs (2026-06-09, versions noted per row) against 2024 administrative targets: SNAP +2.3% vs USDA FY2024, EITC -3.2% vs IRS TY2024, Medicaid enrollment +5.5% vs CMS December 2024, CTC +11.5% vs the latest complete IRS SOI total, and SSI -13%/-22% with the documented take-up gap stated plainly. Income tax is deliberately excluded: the current enhanced CPS build overstates AGI via an inflated miscellaneous_income imputation (tracked upstream in policyengine-us-data#1107), and publishing that number would misrepresent the model. Peer benchmarks give the page independent grounding: the Census Bureau's evaluation of TRIM3 (87% of the IRS income-tax target, 73% of EITC, 101% of CTC) and TAXSIM (88%, 73%) for TY2012, Tax-Calculator's sub-dollar agreement with TAXSIM-35, and six UKMOD simulated-vs-official 2023 benchmarks from the CeMPA country report, including the documented 37% Housing Benefit shortfall. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…e facts Usage rows replace blanket unknowns with verified, sourced facts: HM Treasury's Algorithmic Transparency Record documenting its PolicyEngine UK pilot (with The Times coverage), the Nuffield Foundation grant, a US House press release citing PolicyEngine estimates, the Niskanen Center CTC report, MyFriendBen and Benefit Navigator API integrations, the NBER TAXSIM-emulator MoU, and exact GitHub/PyPI counts retrieved 2026-06-09. Transparency rows fill contributor counts (132 US / 23 UK) and test counts (21,697 / 1,084 named cases) with API-backed sources. The JOSS citation is corrected everywhere: DOI 10.21105/joss.04494 belongs to an unrelated paper; PolicyEngine's JOSS submission is under review, so the rows now link the actual review thread and academicCitations stays unknown. Behavioral rows add the optional CBO-derived elasticity presets (income -0.05, substitution 0.22-0.31 by decile, capital gains -0.79) clearly marked off-by-default, complementing the existing static-default rows. Modeling mechanics add state/congressional-district and local-authority/ constituency calibrated weights, the continuous test suite + TAXSIM cross-validation, and the off-by-default behavioral-response design. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

vercel · 2026-06-09T22:01:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyengine-model	Ready	Preview, Comment	Jun 9, 2026 10:01pm

MaxGhenis and others added 4 commits June 9, 2026 23:58

vercel Bot deployed to Preview June 9, 2026 22:01 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the model comparison properly reflect PolicyEngine — audited data + host-model UI#43

Make the model comparison properly reflect PolicyEngine — audited data + host-model UI#43
MaxGhenis wants to merge 4 commits into
masterfrom
improve-model-comparisons

MaxGhenis commented Jun 9, 2026

Uh oh!

vercel Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Jun 9, 2026

Make the comparison reflect the actual model

Fairer, deeper peer rows

UI: the host model reads as "ours"

Uh oh!

vercel Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 9, 2026 •

edited

Loading