Show simple item record

dc.contributor.authorXU, Yingying 19:09:27 (GMT) 19:09:27 (GMT)
dc.description.abstractPeople are often interested in predicting a new or future observation. In clinical prediction, the uptake of Electronic Health Records (EHRs) has generated massive health datasets that are big in volume and diverse in variety. The outcomes can be of different types, e.g., continuous, binary, time-to-event, etc., and covariates can be either time-fixed or longitudinal. These datasets can provide rich and diverse information for modeling and prediction but also pose challenges to fast and accurate prediction of outcomes of interest. One challenge of predicting is that when the data are heterogeneous in the relationship between the covariates and the outcome. In this case, it is quite possible that localizing a subset of data in an informative manner to aid in making predictions will lead to better performance than including all information. Chapter 3 deals with a continuous outcome, and I have developed methodology that gives an interpretable and meaningful definition of similarity, and an algorithm to uncover the similarity structure to improve the prediction accuracy by making similarity-based predictions. In Chapter 4, the similarity-based prediction is extended to a survival outcome, with possible independent or dependent censoring. The algorithm is developed under the random forest framework, and I showed through both simulations and a real data example that incorporating the similarity structure indeed improves prediction accuracy in these cases. Another challenge in prediction arises when longitudinal covariates are present, and that there are scenarios when one needs to make an early prediction as soon as practical and thus cannot monitor the full trajectory of longitudinal covariates (before the prediction is required). In Chapter 5, I address this concern by quantifying the relationship between the earliness of prediction and the prediction accuracy. A penalization approach with a graphical method is introduced to select a monitoring window length given specific prediction accuracy. Comprehensive simulations are conducted to investigate the performance of the algorithm in selecting the length of the monitoring window in different scenarios.en
dc.publisherUniversity of Waterlooen
dc.titleNew Methods for Improving Accuracy in Three Distinct Predictive Modeling Problemsen
dc.typeDoctoral Thesisen
dc.pendingfalse and Actuarial Scienceen of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws.contributor.advisorDubin, Joel
uws.contributor.advisorLee, Joon
uws.contributor.affiliation1Faculty of Mathematicsen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages