Implementing Fairness in Real-World Healthcare Machine Learning through Datasheet for Database

Murugan, Anand

Implementing Fairness in Real-World Healthcare Machine Learning through Datasheet for Database

Files

Murugan_Anand.pdf (1.4 MB)

Date

2024-05-28

Authors

Murugan, Anand

Advisor

Wong, Alexander
Rambhatla, Sirisha

Publisher

University of Waterloo

Abstract

Healthcare Machine Learning (HML) models are revolutionizing the healthcare industry, promising improved patient outcomes and enhanced public health. However, it is essential to ensure fairness, i.e., models delivering equitable performance to all individuals, irrespective of their inherent or acquired characteristics. This requires a thorough examination of the data used and the specific applications of these models. This study conducted a six-year systematic survey of models trained on the Medical Information Mart for Intensive Care (MIMIC) clinical research database (CRD) – one of the most popular and widely used HML databases to explore the link between data and fairness in HML. The results were striking: for the popular MIMIC IV – ICU mortality task, a naive baseline outperformed the state-of-the-art (SOTA) model in prediction performance, demonstrating greater fairness across subgroups (while still somewhat unfair). These findings demonstrate the urgent need to integrate fairness into healthcare machine learning models and a greater need to include practitioners in HML modeling. To achieve this, we propose a data-centric approach to fairness through our ‘Datasheet for MIMIC IV v2.0 CRD’, modeled after the recent works recommending datasheets for datasets. Given that MIMIC is large and complex, this datasheet will assist practitioners in identifying data anomalies and task-specific feature-target relationships during modeling, thereby fostering the development of equitable HML models.

Keywords

Fairness, healthcare machine learning, clinical research database, medical information mart for intensive care (MIMIC), risk prediction, Datasheet for MIMIC IV v2.0 CRD

URI

http://hdl.handle.net/10012/20624

Collections

Theses
Systems Design Engineering

Full item page

Implementing Fairness in Real-World Healthcare Machine Learning through Datasheet for Database

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections