|Studying complex relationships between correlated responses and the associated covariates has attracted much research interest. Numerous approaches have been developed to model correlated responses. However, most available methods rely on a crucial condition that response variables need to be precisely measured. In practice, this condition is often violated due to various reasons related to the data collection, study designs, or the nature of the variables. Without taking care of the feature of mismeasurement in variables, inference results are often biased.
Although measurement error and misclassification have been extensively studied in the literature, many problems of mismeasurement in correlated responses remain unexplored. The first problem of interest concerns measurement error and misclassification in the joint modeling of continuous and binary responses. In Chapter 2, we consider the setting with a bivariate outcome vector which contains a continuous component and a binary component both subject to mismeasurement. We propose an induced likelihood approach and describe an EM algorithm to handle measurement error in continuous response and misclassification in binary response simultaneously. The algorithm is fast and can be easily implemented. Simulation studies confirm that the proposed methods successfully remove the bias induced from the response mismeasurement. We implement the proposed methods to mice data arising from a genome-wide association study.
As a complement to the likelihood-based methods discussed in Chapter 2, in Chapter 3, we explore the bivariate generalized estimation equation method with mixed responses subject to measurement error and misclassification. The generalized estimating equation method enjoys robustness to certain model misspecification as well as consistency in the estimation of the mean structure parameters. However, the consistency property relies on the unbiasedness of estimating functions which can break down in the presence of the measurement error and misclassification in responses. We propose an insertion strategy to simultaneously account for measurement error effects in a continuous response and misclassification effects in a binary response. We consider scenarios where either an internal or an external validation subsample is available.
In Chapter 4, we consider a more complex situation where covariates are of a high dimension and may possess a network structure. We start with the case where data are precisely measured and propose a generalized network structure model together with the development of a two-step inferential procedure. In the first step, we employ a Gaussian graphical model to facilitate the network structure, and in the second step, we incorporate the estimated graphical structure of covariates and develop an estimating equation method. Furthermore, we extend the development to accommodating mismeasured responses. We consider two cases where the information on mismeasurement is known or a validation sample is available. Theoretical results are established and numerical studies are conducted to justify the performance of the proposed methods.
In contrast to error-prone continuous and binary responses considered in the first three chapters, we investigate error-corrupted count data which particularly involve zero-inflated counts, a problem that has received little attention. Zero-inflated count data arise frequently from cancer genomics studies, and it is often of interest to incorporate the feature of excessive zeros in the analysis. However, measurement error in count responses is barely studied, let along the zero-inflated Poisson model with measurement error. In Chapter 5, we propose a novel measurement error model which is unique for addressing error-contaminated count data. We show that ignoring the measurement error effects in analyzing the count response may generally lead to invalid inference results, and meanwhile, we identify situations where ignoring measurement error can still yield consistent estimators. Furthermore, we propose a Bayesian method to address the effects of measurement error under the zero-inflated Poisson model. We develop a data-augmentation algorithm that is easy to implement. Simulation studies are conducted to evaluate the performance of the proposed method. We apply our method to analyze a set of prostate adenocarcinoma genomics data.
Finally, in Chapter 6, we examine another type of correlated responses: time series data. We consider the autoregressive model and establish analytical results for quantifying the biases of the parameter estimation if the measurement error effects are neglected. We propose two measurement error models to describe different processes of data contamination. An estimating equation approach is proposed for the estimation of the model parameters with measurement error effects accounted for. We further discuss forecasting using error-prone times series data. This work is motivated by the need of understanding the ongoing evolving situation of the COVID-19 pandemic. It is important to assess how the mortality rate may change over time, but error-contaminate COVID-19 data present a considerable challenge in uncovering the true development path of the disease.