Evaluating the Usefulness of Synthetic Data in Healthcare: Applications in Predictive Modeling and Privacy Protection

dc.contributor.authorBasri, Mohammad Ahmed
dc.date.accessioned2024-04-24T20:19:18Z
dc.date.available2024-04-24T20:19:18Z
dc.date.issued2024-04-24
dc.date.submitted2024-04-16
dc.description.abstractThe advent of data-driven approaches in healthcare has opened new horizons for patient care, disease management, and medical research. However, one of the significant challenges is the availability of large-scale, high-quality datasets. Accessing health data that contains sensitive information requires lengthy approval processes and stringent restrictions. Synthetic data effectively addresses this dilemma by replicating the statistical properties of real datasets, offering a viable solution. Due to privacy concerns and regulatory restrictions associated with health data, there is a growing need for highly realistic synthetic health data, particularly in health data science initiatives. While significant advancements have been achieved in establishing recognized evaluation methods for synthetic data models, there remains a notable gap in understanding the optimal approaches to enhance the quality and usefulness of synthetic data. This thesis aims to bridge this gap by conducting a systematic evaluation of objective functions for hyperparameter tuning of synthetic data generation and studying the efficacy of synthetic data in predictive models. We evaluate synthetic data using three criteria: Fidelity, assessing how well it mirrors real-world data statistically; Utility, measuring its effectiveness for machine learning applications; and Privacy, evaluating the risk of re-identification. We examine the usefulness of synthetic data for the hyperparameter optimization process of predictive models, particularly in scenarios where access to real data is constrained. We found a notable correlation between model performance accuracy using real data and synthetic data, suggesting that parameters optimized with synthetic data are applicable to real data for optimal results. Our study confirms the feasibility of using synthetic data on external computing resources to optimize models, effectively addressing healthcare's computing constraints.en
dc.identifier.urihttp://hdl.handle.net/10012/20492
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectsynthetic health dataen
dc.subjectmachine learningen
dc.subjecthealthcareen
dc.subjectpublic healthen
dc.subjecthyperparameter tuningen
dc.subjectdata utilityen
dc.subjectdata privacyen
dc.subjectdata-driven healthcareen
dc.subjectpredictive healthcare analyticsen
dc.subjectmodel optimizationen
dc.subjectpatient privacyen
dc.subjectclinical data analysisen
dc.subjectmedical research dataen
dc.subjecthealth data scienceen
dc.titleEvaluating the Usefulness of Synthetic Data in Healthcare: Applications in Predictive Modeling and Privacy Protectionen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.degree.departmentSystems Design Engineeringen
uws-etd.degree.disciplineSystem Design Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorChen, Helen
uws.contributor.advisorWong, Alexander
uws.contributor.affiliation1Faculty of Engineeringen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Basri_MohammadAhmed.pdf
Size:
12.69 MB
Format:
Adobe Portable Document Format
Description:
Main Article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: