Sharafoddini, Anis2019-05-282019-05-282019-05-282019-05-01http://hdl.handle.net/10012/14726The growing adoption of Electronic Health Record (EHR) systems has resulted in an unprecedented amount of data. This availability of data has also opened up the opportunity to utilize EHRs for providing more customized care for each patient by considering individual variability, which is the goal of precision medicine. In this context, patient similarity (PS) analytics have been introduced to facilitate data analysis through investigating the similarities in patients’ data, and, ultimately, to help improve the healthcare system. This dissertation is presented in six chapters and focuses on employing PS analytics in data-rich intensive care units. Chapter 1 provides a review of the literature and summarizes studies describing approaches for predicting patients’ future health status based on EHR and PS. Chapter 2 demonstrates the informativeness of missing data in patient profiles and introduces missing data indicators to use this information in mortality prediction. The results demonstrate that including indicators with observed measurements in a set of well-known prediction models (logistic regression, decision tree, and random forest) can improve the predictive accuracy. Chapter 3 builds upon the previous results and utilizes these missing indicators to reveal patient subpopulations based on their similarity in laboratory test ordering being used for them. In this chapter, the Density-based Spatial Clustering of Applications with Noise method, was employed to group the patients into clusters using the indicators generated in the previous study. Results confirmed that missing indicators capture the laboratory-test-ordering patterns that are informative and can be used to identify similar patient subpopulations. Chapter 4 investigates the performance of a multifaceted PS metric constructed by utilizing appropriate similarity metrics for specific clinical variables (e.g. vital signs, ICD-9, etc.). The proposed PS metric was evaluated in a 30-day post-discharge mortality prediction problem. Results demonstrate that PS-based prediction models with the new PS metric outperformed population-based prediction models. Moreover, the multifaceted PS metric significantly outperformed cosine and Euclidean PS metric in k-nearest neighbors setting. Chapter 5 takes the previous results into consideration and looks for potential subpopulations among septic patients. Sepsis is one of the most common causes of death in Canada. The focus of this chapter is on longitudinal EHR data which are a collection of observations of measurements made chronologically for each patient. This chapter employs Functional Principal Component Analysis to derive the dominant modes of variation in septic patients’ EHR's. Results confirm that including temporal data in the analysis can help in identifying subgroups of septic patients. Finally, Chapter 6 provides a discussion of results from previous chapters. The results indicate the informativeness of missing data and how PS can help in improving the performance of predictive modeling. Moreover, results show that utilizing the temporal information in PS calculation improves patient stratification. Finally, the discussion identifies limitations and directions for future research.enToward Precision Medicine in Intensive Care: Leveraging Electronic Health Records and Patient SimilarityDoctoral Thesis