Implementing Fairness in Real-World Healthcare Machine Learning through Datasheet for Database
Loading...
Date
2024-05-28
Authors
Murugan, Anand
Advisor
Wong, Alexander
Rambhatla, Sirisha
Rambhatla, Sirisha
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Healthcare Machine Learning (HML) models are revolutionizing the healthcare industry,
promising improved patient outcomes and enhanced public health. However, it is
essential to ensure fairness, i.e., models delivering equitable performance to all individuals,
irrespective of their inherent or acquired characteristics. This requires a thorough
examination of the data used and the specific applications of these models.
This study conducted a six-year systematic survey of models trained on the Medical
Information Mart for Intensive Care (MIMIC) clinical research database (CRD) – one of
the most popular and widely used HML databases to explore the link between data and
fairness in HML.
The results were striking: for the popular MIMIC IV – ICU mortality task, a naive baseline
outperformed the state-of-the-art (SOTA) model in prediction performance, demonstrating
greater fairness across subgroups (while still somewhat unfair). These findings
demonstrate the urgent need to integrate fairness into healthcare machine learning models
and a greater need to include practitioners in HML modeling.
To achieve this, we propose a data-centric approach to fairness through our ‘Datasheet
for MIMIC IV v2.0 CRD’, modeled after the recent works recommending datasheets for
datasets. Given that MIMIC is large and complex, this datasheet will assist practitioners in
identifying data anomalies and task-specific feature-target relationships during modeling,
thereby fostering the development of equitable HML models.
Description
Keywords
Fairness, healthcare machine learning, clinical research database, medical information mart for intensive care (MIMIC), risk prediction, Datasheet for MIMIC IV v2.0 CRD