Browsing by Author "Dubin, Joel A."

Now showing 1 - 3 of 3

Dynamic Treatment Regimes and Interference in Dyadic Networks: A Joint Optimization Approach
(University of Waterloo, 2023-09-18) Mussavi Rizi, Marzieh; Dubin, Joel A.; Wallace, Michael P.
Identifying interventions that are optimally tailored to each individual is of significant interest in various fields, in particular precision medicine. Dynamic treatment regimes (DTRs) employ sequences of decision rules that utilize individual patient information to recommend treatments. However, the assumption that an individual's treatment does not impact the outcomes of others, known as the no interference assumption, is often challenged in practical settings. For example, in infectious disease studies, the vaccine status of individuals in close proximity can influence the likelihood of infection. Imposing this assumption when it, in fact, does not hold, may lead to biased results and impact the validity of our resulting DTR optimization. In this thesis, we extend the estimation method of dynamic weighted ordinary least squares (dWOLS), a doubly robust and easily implemented approach for estimating optimal DTRs, to incorporate the presence of interference. Specifically, we develop new methodologies to optimize DTRs in the presence of interference for both binary and continuous treatments. Through comprehensive simulations and analysis of the Population Assessment of Tobacco and Health (PATH) data, we demonstrate the performance of the proposed joint optimization strategy compared to the current state-of-the-art conditional optimization methods. Furthermore, we extend the dWOLS method to accommodate multiple outcomes and patient-specific costs, enhancing its flexibility and applicability in complex health contexts.
Methods for Improving Performance of Precision Health Prediction Models
(University of Waterloo, 2024-09-20) Krikella, Tatiana; Dubin, Joel A.
Prediction models for a specific index patient which are developed on a similar subpopulation have been shown to perform better than one-size-fits-all models. These models are often called \textit{personalized predictive models} (PPMs) as they are tailored to specific individuals with unique characteristics. In this thesis, through a comprehensive set of simulation studies and data analyses, we investigate the relationship between the size of similar subpopulation used to develop the PPMs and model performance. We propose an algorithm which fits a PPM using the size of a similar subpopulation that optimizes both model discrimination and calibration, as it is criticized that calibration is not assessed as often as discrimination in predictive modelling. We do this by proposing a loss function to use when tuning the size of subpopulation which is an extension of a Brier Score decomposition, which consists of separate terms corresponding to model discrimination and calibration, respectively. We allow flexibility through use of a mixture loss term to emphasize one performance measure over another. Through simulation study, we confirm previously investigated results and show that the relationship between the size of subpopulation and discrimination is, in general, negatively proportional: as the size of subpopulation increases, the discrimination of the model deteriorates. Further, we show that the relationship between the size of subpopulation and calibration is quadratic in nature, thus small and large sizes of subpopulation result in relatively well-calibrated models. We investigate the effect of patient weighting on performance, and conclude, as expected, that the choice of the size of subpopulation has a larger effect on the PPM's performance compared to the weight function applied. We apply these methods to a dataset from the eICU database to predict the mortality of patients with diseases of the circulatory system. We then extend the algorithm by proposing a more general loss function which allows further flexibility of choosing the measures of model discrimination and calibration to include in the function used to tune the size of subpopulation. We also recommend bounds on the grid of values used in tuning to reduce the computational burden of the algorithm. Prior to recommending bounds, we further investigate the relationship between the size of subpopulation and discrimination, as well as the size of subpopulation and calibration, under 12 different simulated datasets, to determine if the results from the previous investigation were robust. We find that the relationship between the size of subpopulation and discrimination is always negatively proportional, and the relationship between the size of subpopulation and calibration, although not entirely consistent among the 12 cases, shows that a low size of subpopulation is good, if not optimal, in many cases we considered. Based on this study, we recommend a lower bound on the grid of values to be 20\% of the entire training dataset, and the upper bound to be either 50\% or 70\% of the training dataset, depending on the interests of the study. We apply the methods proposed to both simulated and real data, specifically, the same dataset from the eICU database, and show that the results previously seen are robust, and that the choice of measures for the general loss function have an effect on the optimal size of subpopulation chosen. Finally, we extend the algorithm to predict the longitudinal, continuous outcome trajectory of an index patient, rather than predicting a binary outcome. We investigate the relationship between the size of subpopulation and the mean absolute error, and find that the performance drastically improves up to a point, where it then stabilizes, leading to the model fit to the full training data as the optimal model, just tending to be slightly better than a model fit to 60\% of the subpopulation. As these results are counter-intuitive, we present three other simulation studies which show that these results stem from predicting the trajectory of a patient, rather than from predicting a continuous outcome. Although the answer to why this is the case is an open research question, we speculate that, since a personalized approach still leads to comparable performance to the full model, these results can be attributed to testing these methods on a small sample size. Due to the computational intensity of the methods, however, testing on a larger sample size to generalize these results is currently impractical. Areas of future work include improving the computational efficiency of these methods, which can lead to investigating these same relationships under more complex models, such as random forest or gradient boosting. Further investigation of personalized predictive model performance when predicting a trajectory should also be considered. The methods presented in this thesis will be summarized into an R package to allow for greater usability.
Methods in Functional Data Analysis: Forecast Evaluation, Robust Serial Dependence Measures, and a Spatial Factor Copula Model
(University of Waterloo, 2023-09-05) Yeh, Chi-Kuang; Rice, Gregory; Dubin, Joel A.
With advancements in technology, new types of data have become available, including functional data, which observations in the form of functions or curves rather than scalar or vector-valued quantities. This emerging area presents unique challenges in handling intrinsically infinite-dimensional objects. In this thesis, we primarily focus on three problems, each of which has a distinct flavour of functional data analysis. In Chapter 1, we provide an overview of the foundational concepts and methodologies that will serve as a basis for the subsequent chapters. This includes an exploration of topics such as functional data analysis, functional time series analysis, probabilistic forecasts, copula modelling, and robust methods that will be in later chapters. Additionally, we conclude this chapter by presenting a comprehensive list of the main contributions made by this thesis. In Chapter 2, motivated by the goal of evaluating real-time forecasts of home team win probabilities in the National Basketball Association, we develop new tools for measuring the quality of continuously updated probabilistic forecasts. This includes introducing calibration surface plots, and simple graphical summaries of them, to evaluate at a glance whether a given continuously updated probability forecasting method is well-calibrated, as well as developing statistical tests and graphical tools to evaluate the skill, or relative performance, of two competing continuously updated forecasting methods. These tools are demonstrated in an application on evaluating the continuously updated forecasts published by United States-based multinational sports network ESPN on its principle webpage espn.com. This application provides statistical evidence that the forecasts published there are well-calibrated, and exhibit improved skill over several naïve models, but do not show significantly improved skill over simple logistic regression models based solely on a measurement of each teams’ relative strength, and the evolving score difference throughout the game. In Chapter 3, we propose a new autocorrelation measure for functional time series that we term “spherical autocorrelation.” It is based on measuring the average angle between lagged pairs of series after having been projected onto a unit sphere. This new measure enjoys at least two complimentary advantages compared to existing autocorrelation measures for functional data, since it both 1) describes a notion of “sign” or “direction” of serial dependence in the series, and 2) is more robust to outliers. The asymptotic properties of estimators of the spherical autocorrelation are established, and used to construct confidence intervals and portmanteau white noise tests. These confidence intervals and tests are shown to be effective in simulation experiments, and in applications model selection for daily electricity price curves, and in measuring volatility in densely observed asset price data. In Chapter 4, we propose a new model for spatial functional data that departs from the commonly adopted assumption of normality of the errors. Instead, we assume the existence of a common process that equally affects the measurements of the data at all locations at each time point. By using general copulas, our model can accommodate heavy tails and tail asymmetry, which the existing methods may suffer from. We then derive the closed-form expression of the likelihood function when the tail dependence is generated by an exponential distribution. The simulation studies show that the parameter estimates of the proposed method accurately capture the spatial and temporal dependence when the model is correctly specified. In the case where the model is misspecified, our method is still robust in capturing the spatial dependence and the general shape of the common mean function. We close the chapter by discussing some future works and potential extensions of the proposed model. We conclude this thesis by presenting concise summaries of each chapter and engaging in further discussions in Chapter 5. Additionally, we also offer directions for future research in each chapter, highlighting potential applications of the proposed methods. Furthermore, we explore theoretical and computational avenues that may prove beneficial to practitioners and researchers, extending the scope of the proposed methods to encompass research, applications, and beyond.