Methods for Improving Performance of Precision Health Prediction Models
Loading...
Date
2024-09-20
Authors
Advisor
Dubin, Joel A.
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Prediction models for a specific index patient which are developed on a similar subpopulation have been shown to perform better than one-size-fits-all models. These models are often called \textit{personalized predictive models} (PPMs) as they are tailored to specific individuals with unique characteristics. In this thesis, through a comprehensive set of simulation studies and data analyses, we investigate the relationship between the size of similar subpopulation used to develop the PPMs and model performance.
We propose an algorithm which fits a PPM using the size of a similar subpopulation that optimizes both model discrimination and calibration, as it is criticized that calibration is not assessed as often as discrimination in predictive modelling. We do this by proposing a loss function to use when tuning the size of subpopulation which is an extension of a Brier Score decomposition, which consists of separate terms corresponding to model discrimination and calibration, respectively. We allow flexibility through use of a mixture loss term to emphasize one performance measure over another. Through simulation study, we confirm previously investigated results and show that the relationship between the size of subpopulation and discrimination is, in general, negatively proportional: as the size of subpopulation increases, the discrimination of the model deteriorates. Further, we show that the relationship between the size of subpopulation and calibration is quadratic in nature, thus small and large sizes of subpopulation result in relatively well-calibrated models. We investigate the effect of patient weighting on performance, and conclude, as expected, that the choice of the size of subpopulation has a larger effect on the PPM's performance compared to the weight function applied. We apply these methods to a dataset from the eICU database to predict the mortality of patients with diseases of the circulatory system.
We then extend the algorithm by proposing a more general loss function which allows further flexibility of choosing the measures of model discrimination and calibration to include in the function used to tune the size of subpopulation. We also recommend bounds on the grid of values used in tuning to reduce the computational burden of the algorithm. Prior to recommending bounds, we further investigate the relationship between the size of subpopulation and discrimination, as well as the size of subpopulation and calibration, under 12 different simulated datasets, to determine if the results from the previous investigation were robust. We find that the relationship between the size of subpopulation and discrimination is always negatively proportional, and the relationship between the size of subpopulation and calibration, although not entirely consistent among the 12 cases, shows that a low size of subpopulation is good, if not optimal, in many cases we considered. Based on this study, we recommend a lower bound on the grid of values to be 20\% of the entire training dataset, and the upper bound to be either 50\% or 70\% of the training dataset, depending on the interests of the study. We apply the methods proposed to both simulated and real data, specifically, the same dataset from the eICU database, and show that the results previously seen are robust, and that the choice of measures for the general loss function have an effect on the optimal size of subpopulation chosen.
Finally, we extend the algorithm to predict the longitudinal, continuous outcome trajectory of an index patient, rather than predicting a binary outcome. We investigate the relationship between the size of subpopulation and the mean absolute error, and find that the performance drastically improves up to a point, where it then stabilizes, leading to the model fit to the full training data as the optimal model, just tending to be slightly better than a model fit to 60\% of the subpopulation. As these results are counter-intuitive, we present three other simulation studies which show that these results stem from predicting the trajectory of a patient, rather than from predicting a continuous outcome. Although the answer to why this is the case is an open research question, we speculate that, since a personalized approach still leads to comparable performance to the full model, these results can be attributed to testing these methods on a small sample size. Due to the computational intensity of the methods, however, testing on a larger sample size to generalize these results is currently impractical.
Areas of future work include improving the computational efficiency of these methods, which can lead to investigating these same relationships under more complex models, such as random forest or gradient boosting. Further investigation of personalized predictive model performance when predicting a trajectory should also be considered. The methods presented in this thesis will be summarized into an R package to allow for greater usability.
Description
Keywords
personalized prediction, precision medicine, prediction, model calibration, model discrimination, cosine similarity