Statistical Methods for Multi-State Analysis of Incomplete Longitudinal Data
Loading...
Date
2008-09-25T18:21:24Z
Authors
Chen, Baojiang
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Analyses of longitudinal categorical data are typically based on
semiparametric models in which covariate effects are expressed on
marginal probabilities and estimation is carried out based on
generalized estimating equations (GEE). Methods based on GEE are
motivated in part by the lack of tractable models for clustered
categorical data. However such marginal methods may not yield fully
efficient estimates, nor consistent estimates when missing data are
present. In the first part of the thesis I develop a Markov model
for the analysis of longitudinal categorical data which facilitates
modeling marginal and conditional structures. A likelihood
formulation is employed for inference, so the resulting estimators
enjoy properties such as optimal efficiency and consistency, and
remain consistent when data are missing at random. Simulation
studies demonstrate that the proposed method performs well under a
variety of situations. Application to data from a smoking prevention
study illustrates the utility of the model and interpretation of
covariate effects.
Incomplete data often arise in many areas of research in practice.
This phenomenon is
common in longitudinal data on disease history of subjects.
Progressive models provide a convenient framework for characterizing
disease processes which arise, for example, when the state
represents the degree of the irreversible damage incurred by the
subject. Problems arise if the mechanism leading to the missing data
is related to the response process. A naive analysis might lead to
biased results and invalid inferences. The second part of this
thesis begins with an investigation of progressive multi-state
models for longitudinal studies with incomplete observations.
Maximum likelihood estimation is carried out based on an EM
algorithm, and variance estimation is provided using Louis method.
In general, the maximum likelihood estimates are valid when the
missing data mechanism is missing completely at random or missing at
random. Here we provide likelihood based method in that the
parameters are identifiable no matter what the missing data
mechanism. Simulation studies demonstrate that the proposed method
works well under a variety of situations.
In practice, we often face data with missing values in both the
response and the covariates, and sometimes there is some association
between the missingness of the response and the covariate. The
proper analysis of this type of data requires taking this
correlation into consideration. The impact of attrition in
longitudinal studies depends on the correlation between the missing
response and missing covariate. Ignoring such correlation can bias
the statistical inference. We have studied the proper method that
incorporates the association between the missingness of the response
and missing covariate through the use of inverse probability
weighted generalized estimating equations. The simulation
illustrates that the proposed method yields a consistent estimator,
while the method that ignores the association yields an inconsistent
estimator.
Many analyses for longitudinal incomplete data focus on studying the
impact of covariates on the mean responses. However, little
attention has been directed to address the impact of missing
covariates on the association parameters in clustered longitudinal
studies. The last part of this thesis mainly addresses this problem.
Weighted first and second order estimating equations are constructed
to obtain consistent estimates of mean and association parameters.