Basri, Mohammad Ahmed2024-04-242024-04-242024-04-242024-04-16http://hdl.handle.net/10012/20492The advent of data-driven approaches in healthcare has opened new horizons for patient care, disease management, and medical research. However, one of the significant challenges is the availability of large-scale, high-quality datasets. Accessing health data that contains sensitive information requires lengthy approval processes and stringent restrictions. Synthetic data effectively addresses this dilemma by replicating the statistical properties of real datasets, offering a viable solution. Due to privacy concerns and regulatory restrictions associated with health data, there is a growing need for highly realistic synthetic health data, particularly in health data science initiatives. While significant advancements have been achieved in establishing recognized evaluation methods for synthetic data models, there remains a notable gap in understanding the optimal approaches to enhance the quality and usefulness of synthetic data. This thesis aims to bridge this gap by conducting a systematic evaluation of objective functions for hyperparameter tuning of synthetic data generation and studying the efficacy of synthetic data in predictive models. We evaluate synthetic data using three criteria: Fidelity, assessing how well it mirrors real-world data statistically; Utility, measuring its effectiveness for machine learning applications; and Privacy, evaluating the risk of re-identification. We examine the usefulness of synthetic data for the hyperparameter optimization process of predictive models, particularly in scenarios where access to real data is constrained. We found a notable correlation between model performance accuracy using real data and synthetic data, suggesting that parameters optimized with synthetic data are applicable to real data for optimal results. Our study confirms the feasibility of using synthetic data on external computing resources to optimize models, effectively addressing healthcare's computing constraints.ensynthetic health datamachine learninghealthcarepublic healthhyperparameter tuningdata utilitydata privacydata-driven healthcarepredictive healthcare analyticsmodel optimizationpatient privacyclinical data analysismedical research datahealth data scienceEvaluating the Usefulness of Synthetic Data in Healthcare: Applications in Predictive Modeling and Privacy ProtectionMaster Thesis