New Methods for Improving Accuracy in Three Distinct Predictive Modeling Problems

dc.contributor.authorXU, Yingying
dc.date.accessioned2018-08-22T19:09:27Z
dc.date.available2018-08-22T19:09:27Z
dc.date.issued2018-08-22
dc.date.submitted2018-08-20
dc.description.abstractPeople are often interested in predicting a new or future observation. In clinical prediction, the uptake of Electronic Health Records (EHRs) has generated massive health datasets that are big in volume and diverse in variety. The outcomes can be of different types, e.g., continuous, binary, time-to-event, etc., and covariates can be either time-fixed or longitudinal. These datasets can provide rich and diverse information for modeling and prediction but also pose challenges to fast and accurate prediction of outcomes of interest. One challenge of predicting is that when the data are heterogeneous in the relationship between the covariates and the outcome. In this case, it is quite possible that localizing a subset of data in an informative manner to aid in making predictions will lead to better performance than including all information. Chapter 3 deals with a continuous outcome, and I have developed methodology that gives an interpretable and meaningful definition of similarity, and an algorithm to uncover the similarity structure to improve the prediction accuracy by making similarity-based predictions. In Chapter 4, the similarity-based prediction is extended to a survival outcome, with possible independent or dependent censoring. The algorithm is developed under the random forest framework, and I showed through both simulations and a real data example that incorporating the similarity structure indeed improves prediction accuracy in these cases. Another challenge in prediction arises when longitudinal covariates are present, and that there are scenarios when one needs to make an early prediction as soon as practical and thus cannot monitor the full trajectory of longitudinal covariates (before the prediction is required). In Chapter 5, I address this concern by quantifying the relationship between the earliness of prediction and the prediction accuracy. A penalization approach with a graphical method is introduced to select a monitoring window length given specific prediction accuracy. Comprehensive simulations are conducted to investigate the performance of the algorithm in selecting the length of the monitoring window in different scenarios.en
dc.identifier.urihttp://hdl.handle.net/10012/13644
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectStatisticsen
dc.titleNew Methods for Improving Accuracy in Three Distinct Predictive Modeling Problemsen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentStatistics and Actuarial Scienceen
uws-etd.degree.disciplineStatisticsen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorDubin, Joel
uws.contributor.advisorLee, Joon
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
XU_Yingying.pdf
Size:
863.64 KB
Format:
Adobe Portable Document Format
Description:
Doctoral Thesis

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.08 KB
Format:
Item-specific license agreed upon to submission
Description: