Analysis of Correlated Data with Measurement Error in Responses or Covariates
Loading...
Date
2010-09-30T18:17:34Z
Authors
Chen, Zhijian
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Correlated data frequently arise from epidemiological studies, especially familial
and longitudinal studies. Longitudinal design has been used by researchers to investigate the changes of certain characteristics over time at the individual level as well as how potential factors influence the changes. Familial studies are often designed to investigate the dependence of health conditions among family members. Various models have been developed for this type of multivariate data, and a wide variety
of estimation techniques have been proposed. However, data collected from observational
studies are often far from perfect, as measurement error may arise from different
sources such as defective measuring systems, diagnostic tests without gold references,
and self-reports. Under such scenarios only rough surrogate variables are measured. Measurement error in covariates in various regression models has been discussed extensively in the literature. It is well known that naive approaches ignoring covariate error often lead to inconsistent estimators for model parameters.
In this thesis, we develop inferential procedures for analyzing correlated data with
response measurement error. We consider three scenarios: (i) likelihood-based inferences for generalized linear mixed models when the continuous response is subject to nonlinear measurement errors; (ii) estimating equations methods for binary responses with misclassifications; and (iii) estimating equations methods for ordinal
responses when the response variable and categorical/ordinal covariates are subject
to misclassifications.
The first problem arises when the continuous response variable is difficult to measure.
When the true response is defined as the long-term average of measurements, a single measurement is considered as an error-contaminated surrogate. We focus on generalized linear mixed models with nonlinear response error and study the induced bias in naive estimates. We propose likelihood-based methods that can yield consistent and efficient estimators for both fixed-effects and variance parameters. Results of simulation studies and analysis of a data set from the Framingham Heart Study
are presented.
Marginal models have been widely used for correlated binary, categorical, and ordinal data. The regression parameters characterize the marginal mean of a single outcome, without conditioning on other outcomes or unobserved random effects. The generalized estimating equations (GEE) approach, introduced by Liang and Zeger (1986), only models the first two moments of the responses with associations being
treated as nuisance characteristics. For some clustered studies especially familial
studies, however, the association structure may be of scientific interest. With binary
data Prentice (1988) proposed additional estimating equations that allow one to
model pairwise correlations. We consider marginal models for correlated binary data
with misclassified responses. We develop “corrected” estimating equations approaches
that can yield consistent estimators for both mean and association parameters. The
idea is related to Nakamura (1990) that is originally developed for correcting bias
induced by additive covariate measurement error under generalized linear models. Our approaches can also handle correlated misclassifications rather than a simple
misclassification process as considered by Neuhaus (2002) for clustered binary data
under generalized linear mixed models. We extend our methods and further develop
marginal approaches for analysis of longitudinal ordinal data with misclassification in both responses and categorical covariates. Simulation studies show that our proposed methods perform very well under a variety of scenarios. Results from application of the proposed methods to real data are presented.
Measurement error can be coupled with many other features in the data, e.g., complex survey designs, that can complicate inferential procedures. We explore combining
survey weights and misclassification in ordinal covariates in logistic regression
analyses. We propose an approach that incorporates survey weights into estimating
equations to yield design-based unbiased estimators.
In the final part of the thesis we outline some directions for future work, such as
transition models and semiparametric models for longitudinal data with both incomplete
observations and measurement error. Missing data is another common feature in applications. Developing novel statistical techniques for dealing with both missing
data and measurement error can be beneficial.
Description
Keywords
Estimating equations, Generalized mixed models, Longitudinal data, Measurement error, Odds ratio