Implementing Fairness in Real-World Healthcare Machine Learning through Datasheet for Database

dc.contributor.authorMurugan, Anand
dc.date.accessioned2024-05-28T17:19:32Z
dc.date.available2024-05-28T17:19:32Z
dc.date.issued2024-05-28
dc.date.submitted2024-04-30
dc.description.abstractHealthcare Machine Learning (HML) models are revolutionizing the healthcare industry, promising improved patient outcomes and enhanced public health. However, it is essential to ensure fairness, i.e., models delivering equitable performance to all individuals, irrespective of their inherent or acquired characteristics. This requires a thorough examination of the data used and the specific applications of these models. This study conducted a six-year systematic survey of models trained on the Medical Information Mart for Intensive Care (MIMIC) clinical research database (CRD) – one of the most popular and widely used HML databases to explore the link between data and fairness in HML. The results were striking: for the popular MIMIC IV – ICU mortality task, a naive baseline outperformed the state-of-the-art (SOTA) model in prediction performance, demonstrating greater fairness across subgroups (while still somewhat unfair). These findings demonstrate the urgent need to integrate fairness into healthcare machine learning models and a greater need to include practitioners in HML modeling. To achieve this, we propose a data-centric approach to fairness through our ‘Datasheet for MIMIC IV v2.0 CRD’, modeled after the recent works recommending datasheets for datasets. Given that MIMIC is large and complex, this datasheet will assist practitioners in identifying data anomalies and task-specific feature-target relationships during modeling, thereby fostering the development of equitable HML models.en
dc.identifier.urihttp://hdl.handle.net/10012/20624
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://anonymous.4open.science/r/DatasheetCRD-1F00/README.mden
dc.relation.urihttps://github.com/criticalml-uw/MIMIC-IV-Fairnessen
dc.subjectFairnessen
dc.subjecthealthcare machine learningen
dc.subjectclinical research databaseen
dc.subjectmedical information mart for intensive care (MIMIC)en
dc.subjectrisk predictionen
dc.subjectDatasheet for MIMIC IV v2.0 CRDen
dc.titleImplementing Fairness in Real-World Healthcare Machine Learning through Datasheet for Databaseen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.degree.departmentSystems Design Engineeringen
uws-etd.degree.disciplineSystem Design Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.comment.hiddenEntire Datasheet for MIMIC IV v2.0 is provided available at https://anonymous.4open.science/r/DatasheetCRD-1F00/README.md and the ICU Mortality prediction and fairness analysis implementation code is available at https://github.com/criticalml-uw/MIMIC-IV-Fairness repository.en
uws.contributor.advisorWong, Alexander
uws.contributor.advisorRambhatla, Sirisha
uws.contributor.affiliation1Faculty of Engineeringen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Murugan_Anand.pdf
Size:
1.4 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: