Skip to content

Add MIMIC-III circulatory failure dataset and prediction task with ablation study#1096

Open
kywang0906 wants to merge 14 commits intosunlabuiuc:masterfrom
kywang0906:bella-mimic3-cf
Open

Add MIMIC-III circulatory failure dataset and prediction task with ablation study#1096
kywang0906 wants to merge 14 commits intosunlabuiuc:masterfrom
kywang0906:bella-mimic3-cf

Conversation

@kywang0906
Copy link
Copy Markdown

@kywang0906 kywang0906 commented Apr 22, 2026

Summary

This PR adds a new dataset and task for early circulatory failure prediction on MIMIC-III, inspired by the FAMEWS framework for fairness-aware early warning systems.

The contribution includes:

  • A custom dataset (MIMIC3CirculatoryFailureDataset) for cohort construction and MAP time-series extraction
  • A task (CirculatoryFailurePredictionTask) for generating time-point prediction samples
  • Example scripts demonstrating task usage and ablation studies
  • Unit tests using synthetic data

Paper Reference

This work is inspired by:

FAMEWS: Fairness Auditing for Medical Early-Warning Systems

The task design follows the early-warning prediction setting described in the paper, focusing on predicting circulatory failure events within a future time window.


Contributors:

Kuang-Yu Wang (Net Id: kuangyu4, Email: kuangyu4@illinois.edu)
Ya Hsuan Yang (Net Id: yhyang3, Email: yhyang3@illinois.edu)


Contribution Type

  • New Dataset
  • New Task
  • New Model

File Overview

Dataset

  • pyhealth/datasets/mimic3_cf.py
    Implements cohort construction, failure label generation, and MAP time-series extraction

  • pyhealth/datasets/configs/mimic3_cf.yaml
    Dataset configuration for MIMIC-III tables

Task

  • pyhealth/tasks/circulatory_failure_prediction.py
    Defines input/output schema and label generation logic

Example

  • examples/mimic3_cf_circulatory_failure_logreg.py
    Demonstrates dataset-task pipeline and ablation study

Tests

  • tests/core/test_circulatory_failure_prediction.py
  • tests/core/test_mimic3_cf.py

Ablation Study

We conduct ablation experiments on:

1. Prediction Window

  • 6 hours
  • 12 hours
  • 24 hours

Results show that shorter prediction windows yield stronger predictive signals, while longer horizons introduce uncertainty and reduce performance.

2. Feature Design

  • Baseline: MAP only
  • Advanced: MAP + temporal difference (map_diff)

The advanced feature setting improves recall by capturing temporal trends in physiological signals.

3. Fairness Analysis (Subgroup Evaluation)

We evaluate performance across gender subgroups (M/F) and observe substantial disparities in recall and ROC-AUC, consistent with findings in FAMEWS regarding fairness issues in early warning systems.


Testing

All tests are implemented using synthetic data to ensure fast execution:

  • Task tests validate label generation logic and output format
  • Dataset tests validate integration with task and sample generation

Notes

  • Due to MIMIC-III de-identification, timestamps are shifted; only relative temporal relationships are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant