Implementing Fairness in Real-World Healthcare Machine Learning through Datasheet for Database
Abstract
Healthcare Machine Learning (HML) models are revolutionizing the healthcare industry,
promising improved patient outcomes and enhanced public health. However, it is
essential to ensure fairness, i.e., models delivering equitable performance to all individuals,
irrespective of their inherent or acquired characteristics. This requires a thorough
examination of the data used and the specific applications of these models.
This study conducted a six-year systematic survey of models trained on the Medical
Information Mart for Intensive Care (MIMIC) clinical research database (CRD) – one of
the most popular and widely used HML databases to explore the link between data and
fairness in HML.
The results were striking: for the popular MIMIC IV – ICU mortality task, a naive baseline
outperformed the state-of-the-art (SOTA) model in prediction performance, demonstrating
greater fairness across subgroups (while still somewhat unfair). These findings
demonstrate the urgent need to integrate fairness into healthcare machine learning models
and a greater need to include practitioners in HML modeling.
To achieve this, we propose a data-centric approach to fairness through our ‘Datasheet
for MIMIC IV v2.0 CRD’, modeled after the recent works recommending datasheets for
datasets. Given that MIMIC is large and complex, this datasheet will assist practitioners in
identifying data anomalies and task-specific feature-target relationships during modeling,
thereby fostering the development of equitable HML models.
Collections
Cite this version of the work
Anand Murugan
(2024).
Implementing Fairness in Real-World Healthcare Machine Learning through Datasheet for Database. UWSpace.
http://hdl.handle.net/10012/20624
Other formats