Make microimpute.Imputer the canonical regime-gated sequential imputer by MaxGhenis · Pull Request #196 · PolicyEngine/microimpute

MaxGhenis · 2026-06-06T20:26:41Z

Summary

Makes microimpute.Imputer the canonical, opinionated imputer — sign-regime gating + QRF base + sequential chained-equations imputation, all on by default — and renames the abstract base to BaseImputer. Follow-up to #193 (chaining) and built on top of #195 (QRF feature-order), both of which are preserved.

Sequential chaining is always on (no sequential flag): imputing a list of targets conditions each on the previously-imputed ones, so the imputed vector preserves cross-variable joint structure. The old per-variable-independent path (which was accidentally comonotonic — identical base seeds) is gone.
signregime: bool = True — the {neg,0,pos} gate (which fixes the "QRF on y>0 drops the negative tail" bug, e.g. capital/business losses); signregime=False imputes with the base model directly.
base_imputer_class=QRF (default) — the model knob, for experiments.
Rename: ZeroInflatedImputer → Imputer (it was a misnomer — regime-gated/hurdle, not zero-inflation); abstract base Imputer → BaseImputer (still exported). Breaking — see migration.
sklearn-shaped fitted state instead of a bespoke lineage object: the result exposes regimes_, predictors_ (the chained predictor set per target), models_ ({var: {role: estimator}}), and QRFResults carries standard feature_importances_ (keyed by the forest's fitted columns — self-consistent) / feature_names_in_. Full per-variable lineage is assembled by callers (microplex), not microimpute.

Migration

from microimpute.models.zero_inflated import ZeroInflatedImputer → from microimpute import Imputer
old base class Imputer (subclass/isinstance) → BaseImputer

Testing

tests/test_models/: 136 passed, 3 skipped (skips = optional rpy2 Matching).
ruff format --check clean.
Preserve QRF feature order during prediction #195 preserved (verified: _align_features byte-identical; not_numeric_categorical threading intact). Reviewed via an independent review-fix cycle that caught and fixed an earlier branch's accidental Preserve QRF feature order during prediction #195 revert + a feature_importances_/feature_names_in_ misalignment bug; round-2 review found no actionable issues.

🤖 Generated with Claude Code

…style fitted state Rename ZeroInflatedImputer to the canonical microimpute.Imputer (regime-gated, QRF-based, sequentially-chained), make it the opinionated default, and rename the abstract base class Imputer -> BaseImputer. The module moves models/zero_inflated.py -> models/regime_gated.py and the result class becomes the internal RegimeGatedImputerResults. - Chaining is always on: each numeric target conditions on the original predictors plus the previously-imputed numeric targets (the `sequential` constructor flag is removed). - New `signregime: bool = True` arg; `signregime=False` skips regime detection and imputes each numeric target with a single base imputer over the full training set (the REGIME_NO_GATE path). - The fitted result exposes sklearn-style state: regimes_ ({var: regime}), predictors_ ({var: chained predictor list}), models_ ({var: {role: estimator}} with roles single/gate/positive/negative). No lineage()/VariableLineage. - QRFResults gains feature_names_in_ (original input predictor names, ndarray) and feature_importances_ ({fitted_feature: importance} dict keyed by the forest's ACTUAL fitted columns; {var: {feature: importance}} for multiple variables; AttributeError for non-QRF bases). Keying by the fitted columns fixes the prior name/value misalignment under categorical (dummy-expanded) predictors. Preserves PR #195 in full: regime_gated.py still threads not_numeric_categorical into every nested base fit, and qrf.py keeps the _align_features/feature_columns prediction-order tracking byte-for-byte. Tests: test_zero_inflated.py -> test_regime_gated.py (ported, classes renamed, #195 not_numeric_categorical tests kept); test_qrf.py keeps the #195 reorder test and adds feature_importances_/feature_names_in_ self-consistency regression tests (including the categorical-predictor case that was broken before). test_regime_gated_chaining.py replaces test_zero_inflated_chaining.py and adds chaining-recovery, fitted-attributes, and signregime=False coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-06T20:26:43Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
microimpute-dashboard	Ready	Preview, Comment	Jun 10, 2026 3:20am

Imputer.fit(weight_col=...) only forwarded weights to the auxiliary base imputer for non-numeric targets. The per-variable numeric chain called _fit_single_numeric, which neither accepted nor forwarded weights, so regime gate classifiers and per-regime base imputers all fit unweighted: weighted and unweighted fits produced identical imputations for every numeric target. Two layers needed fixing: - regime_gated.py: resolve weights once in fit() (column name, array, or Series — extracted BaseImputer's inline resolution into a reusable _resolve_sample_weights helper) and thread the per-row vector through _fit_single_numeric into every nested fit: gate classifiers receive it as sample_weight, and _fit_base_single forwards it as weight_col to each per-regime base imputer, sliced with the same row mask as the training slice (positive/negative/nonzero parts). - qrf.py: passing sample_weight natively to RandomForestQuantileRegressor / RandomForestClassifier is a no-op for the predictive distribution — quantile_forest only uses it as a zero-weight filter when assembling leaf membership, and fully-grown forest leaves hold one training sample each, so weighted impurity moves nothing either. The QRF model classes now materialize weights by weighted bootstrap resampling of their training rows (_weighted_resample) before fitting. _detect_regime stays unweighted deliberately: it is a structural check of which sign classes appear in the donor data (presence-based support detection), not an estimate, so sampling weights do not apply. Verified with a controlled repro (donor mixing a heavily-downweighted y~1e6 block into a weight-1 y~1e4 block): before, weight_col=None and weight_col="weight" both imputed mean ~202,329; after, the weighted fit imputes ~11,090 (true weighted mean 10,755) while the unweighted fit is unchanged. Regression tests cover both the no-gate numeric path and the ZI_POSITIVE gate classifier path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

vercel Bot deployed to Preview June 6, 2026 20:27 View deployment

MaxGhenis mentioned this pull request Jun 9, 2026

Run CI on claude/spec-driven-engine and raise Python floor to 3.12 PolicyEngine/microplex#75

Open

vercel Bot deployed to Preview June 10, 2026 03:20 View deployment

MaxGhenis mentioned this pull request Jun 10, 2026

populace-fit: weight-aware conditional models (regime-gated chained QRF) PolicyEngine/populace#2

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make microimpute.Imputer the canonical regime-gated sequential imputer#196

Make microimpute.Imputer the canonical regime-gated sequential imputer#196
MaxGhenis wants to merge 2 commits into
mainfrom
claude/imputer-canonical

MaxGhenis commented Jun 6, 2026

Uh oh!

vercel Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Jun 6, 2026

Summary

Migration

Testing

Uh oh!

vercel Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 6, 2026 •

edited

Loading