Survival Analysis of Complex Featured Data with Measurement Error
Loading...
Date
2019-08-22
Authors
Chen, Li-Pang
Advisor
Yi, Grace Y.
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Survival analysis plays an important role in many fields, such as cancer research, clinical trials, epidemiological studies, actuarial science, and so on. A large body of methods on analyzing survival data have been developed. However, many important problems have still not been fully explored. In this thesis, we focus on the analysis of survival data with complex features.
In Chapter 1, we review relevant topics including survival analysis, the measurement error model, the graphical model, and variable selection.
Graphical models are useful in characterizing the dependence structure of variables. They have been commonly used for analysis of high-dimensional data, including genetic data and data with network structures. Many estimation procedures have been developed under various graphical models with a stringent assumption that the associated variables must be measured precisely. In applications, this assumption, however, is often unrealistic and mismeasurement in variables is usually presented in data. In Chapter 2, we investigate the high-dimensional graphical model with error-prone variables. We propose valid estimation procedures to account for measurement error effects. Theoretical results are established for the proposed methods and numerical studies are reported to assess the performance of our proposed methods.
In Chapter 3, we consider survival analysis with network structures and measurement error in covariates. In survival data analysis, the Cox proportional hazards (PH) model is perhaps the most widely used model to feature the dependence of survival times on covariates. While many inference methods have been developed under such a model or its variants, those models are not adequate for handling data with complex structured covariates. High-dimensional survival data often entail several features: (1) many covariates are inactive in explaining the survival information, (2) active covariates are associated in a network structure, and (3) some covariates are error-contaminated. To hand such kinds of survival data, we propose graphical proportional hazards measurement error models, and develop inferential procedures for the parameters of interest. Our proposed models significantly enlarge the scope of the usual Cox PH model and have great flexibility in characterizing survival data. Theoretical results are established to justify the proposed methods. Numerical studies are conducted to assess the performance of the proposed methods.
In Chapter 4, we focus on sufficient dimension reduction for high-dimensional survival data with covariate measurement error. Sufficient dimension reduction (SDR) is an important tool in regression analysis which reduces the dimension of covariates without losing predictive information. Several methods have been proposed to handle data with either censoring in the response or measurement error in covariates. However, little research is available to deal with data having these two features simultaneously. Moreover, the analysis becomes more challenging when data contain ultrahigh-dimensional covariates. In Chapter 4, we examine this problem. We start with considering the cumulative distribution function in regular settings and propose a valid SDR method to incorporate the effects of both censored data and covariates measurement error. Next, we extend the proposed method to handle ultrahigh-dimensional data. Theoretical results of the proposed methods are established. Numerical studies are reported to assess the performance of the proposed methods.
In Chapter 5, we slightly switch our attention to examine sampling issues concerning survival data. Specifically, we discuss survival analysis for left-truncated and right-censored data with covariate measurement error. Many methods have been developed for analyzing survival data which commonly involve right-censoring. These methods, however, are challenged by complex features pertinent to the data collection as well as the nature of data themselves. Typically, biased samples caused by left-truncation or length-biased sampling and measurement error are often accompanying with survival analysis. While such data frequently arise in practice, little work has been available in the literature. In Chapter 5, we study this important problem and explore valid inference methods for handling left-truncated and right-censored survival data with measurement error under the widely used Cox model. We exploit a flexible estimator for the survival model parameters which does not require specification of the baseline hazard function. To improve the efficiency, we further develop an augmented non-parametric maximum likelihood estimator. We establish asymptotic results for the proposed estimators and examine the efficiency and robustness issues of the proposed estimators. The proposed methods enjoy appealing features that the distributions of the covariates and of the truncation times are left unspecified. Numerical studies are reported to assess the performance of the proposed methods.
In Chapter 6, we study outstanding issues on model selection and model averaging for survival data with measurement error. Model selection plays a critical role in statistical inference and a vast literature has been devoted to this topic. Despite extensive research attention on model selection, research gaps still remain. An important but unexplored problem concerns model selection for truncated and censored data with measurement error. Although analysis of left-truncated and right-censored (LTRC) data has received extensive interests in survival analysis, there has been no research on model selection for LTRC data, let alone LTRC data involving with measurement error. In Chapter 6, we take up this important problem and develop inferential procedures to handle model selection for LTRC data with measurement error in covariates. Our development employs the local model misspecification framework and emphasizes the use of the focus information criterion (FIC). We develop valid estimators using the model averaging scheme and establish theoretical results to justify the validity of our methods. Numerical studies are conducted to assess the performance of the proposed methods.
Finally, Chapter 7 summarizes the thesis with discussions.