Make microimpute.Imputer the canonical regime-gated sequential imputer#193
Merged
Conversation
When imputing a list of targets, each numeric target is now conditioned on the original predictors plus the previously-imputed targets, so the imputed vector preserves cross-variable joint structure instead of imputing each variable independently. This is the correct way to reproduce dependence that runs through the targets themselves (e.g. tax components on the same return) rather than only through the shared predictors. New `sequential` parameter (default True); single-target lists unaffected. Added a test that a target pair correlated only through an unobserved latent factor is recovered by chaining (corr -0.92 vs true -0.93) but not by independent per-variable imputation (-0.21). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes
microimpute.Imputerthe canonical, opinionated imputer — sign-regime gating + QRF base + sequential chained-equations imputation, all on by default — and bakes in what we know works while exposing knobs only for what's worth A/B-testing.This started as "add chaining to
ZeroInflatedImputer" and grew into the right API:sequentialflag. Turning chaining off is never what you want.signregime: bool = True— the{neg, 0, pos}gate (which fixes the "QRF ony>0drops the negative tail" bug, e.g. capital/business losses).signregime=Falseimputes with the base model directly, for comparison.base_imputer_class(defaultQRF) — the model knob, for experiments (OLS, MDN, …).ZeroInflatedImputer→Imputer(it was a misnomer — it's a regime-gated/hurdle model, not zero-inflation); the old abstract baseImputer→BaseImputer(still exported). Breaking — see migration below..lineage()returns per-variableVariableLineage: regime, the chained predictor set, training-support counts, the fitted models, and feature importances where the model exposes them.Migration
from microimpute.models.zero_inflated import ZeroInflatedImputer→from microimpute import ImputerImputer(subclassing / isinstance) →BaseImputerTesting
test_regime_gated_chaining.py: a target pair correlated only through an unobserved latent factor is recovered by one chained list call (corr −0.92 vs true −0.93) but not by separate per-variable calls (the old microplex per-column pattern);.lineage()reports chained predictors/metrics/importances;signregime=Falsedisables gating.test_autoimputefailures are pre-existing — they reference the optional rpy2-backedMatchingimputer, absent in this env.black --line-length 79+isortclean.🤖 Generated with Claude Code