Skip to content

Make microimpute.Imputer the canonical regime-gated sequential imputer#193

Merged
MaxGhenis merged 2 commits into
mainfrom
claude/zeroinflated-chaining
Jun 6, 2026
Merged

Make microimpute.Imputer the canonical regime-gated sequential imputer#193
MaxGhenis merged 2 commits into
mainfrom
claude/zeroinflated-chaining

Conversation

@MaxGhenis

@MaxGhenis MaxGhenis commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Makes microimpute.Imputer the canonical, opinionated imputer — sign-regime gating + QRF base + sequential chained-equations imputation, all on by default — and bakes in what we know works while exposing knobs only for what's worth A/B-testing.

This started as "add chaining to ZeroInflatedImputer" and grew into the right API:

  • Sequential chaining is always on (and was the missing piece). Imputing a list of targets conditions each on the previously-imputed ones, so the imputed vector preserves cross-variable joint structure. The old per-variable-independent path — which was accidentally comonotonic (each target's base QRF is seeded identically, so they drew the same per-row quantile) — is gone, along with its sequential flag. Turning chaining off is never what you want.
  • signregime: bool = True — the {neg, 0, pos} gate (which fixes the "QRF on y>0 drops the negative tail" bug, e.g. capital/business losses). signregime=False imputes with the base model directly, for comparison.
  • base_imputer_class (default QRF) — the model knob, for experiments (OLS, MDN, …).
  • Rename: ZeroInflatedImputerImputer (it was a misnomer — it's a regime-gated/hurdle model, not zero-inflation); the old abstract base ImputerBaseImputer (still exported). Breaking — see migration below.
  • Lineage: the fitted result's .lineage() returns per-variable VariableLineage: regime, the chained predictor set, training-support counts, the fitted models, and feature importances where the model exposes them.

Migration

  • from microimpute.models.zero_inflated import ZeroInflatedImputerfrom microimpute import Imputer
  • references to the old base class Imputer (subclassing / isinstance) → BaseImputer

Testing

  • test_regime_gated_chaining.py: a target pair correlated only through an unobserved latent factor is recovered by one chained list call (corr −0.92 vs true −0.93) but not by separate per-variable calls (the old microplex per-column pattern); .lineage() reports chained predictors/metrics/importances; signregime=False disables gating.
  • Full suite: 312 passed, 3 skipped. The 8 test_autoimpute failures are pre-existing — they reference the optional rpy2-backed Matching imputer, absent in this env.
  • black --line-length 79 + isort clean.

🤖 Generated with Claude Code

When imputing a list of targets, each numeric target is now conditioned on
the original predictors plus the previously-imputed targets, so the imputed
vector preserves cross-variable joint structure instead of imputing each
variable independently. This is the correct way to reproduce dependence
that runs through the targets themselves (e.g. tax components on the same
return) rather than only through the shared predictors.

New `sequential` parameter (default True); single-target lists unaffected.
Added a test that a target pair correlated only through an unobserved latent
factor is recovered by chaining (corr -0.92 vs true -0.93) but not by
independent per-variable imputation (-0.21).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 6, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
microimpute-dashboard Ready Ready Preview, Comment Jun 6, 2026 6:29am

@MaxGhenis MaxGhenis marked this pull request as ready for review June 6, 2026 06:42
@MaxGhenis MaxGhenis merged commit 36c10ba into main Jun 6, 2026
7 checks passed
@MaxGhenis MaxGhenis changed the title Sequential (chained-equations) imputation in ZeroInflatedImputer Make microimpute.Imputer the canonical regime-gated sequential imputer Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant