Longitudinal Data Analysis with Composite Likelihood Methods
MetadataShow full item record
Longitudinal data arise commonly in many fields including public health studies and survey sampling. Valid inference methods for longitudinal data are of great importance in scientific researches. In longitudinal studies, data collection are often designed to follow all the interested information on individuals at scheduled times. The analysis in longitudinal studies usually focuses on how the data change over time and how they are associated with certain risk factors or covariates. Various statistical models and methods have been developed over the past few decades. However, these methods could become invalid when data possess additional features. First of all, incompleteness of data presents considerable complications to standard modeling and inference methods. Although we hope each individual completes all of the scheduled measurements without any absence, missing observations occur commonly in longitudinal studies. It has been documented that biased results could arise if such a feature is not properly accounted for in the analysis. There has been a large body of methods in the literature on handling missingness arising either from response components or covariate variables, but relatively little attention has been directed to addressing missingness in both response and covariate variables simultaneously. Important reasons for the sparsity of the research on this topic may be attributed to substantially increased complexity of modeling and computational difficulties. In Chapter 2 and Chapter 3 of the thesis, I develop methods to handle incomplete longitudinal data using the pairwise likelihood formulation. The proposed methods can handle longitudinal data with missing observations in both response and covariate variables. A unified framework is invoked to accommodate various types of missing data patterns. The performance of the proposed methods is carefully assessed under a variety of circumstances. In particular, issues on efficiency and robustness are investigated. Longitudinal survey data from the National Population Health Study are analyzed with the proposed methods. The other difficulty in longitudinal data is model selection. Incorporating a large number of irrelevant covariates to the model may result in computation, interpretation and prediction difficulties, thus selecting parsimonious models are typically desirable. In particular, the penalized likelihood method is commonly employed for this purpose. However, when we apply the penalized likelihood approach in longitudinal studies, it may involve high dimensional integrals which are computationally expensive. We propose an alternative method using the composite likelihood formulation. Formulation of composite likelihood requires only a partial structure of the correlated data such as marginal or pairwise distributions. This strategy shows modeling tractability and computational cheapness in model selection. Therefore, in Chapter 4 of this thesis, I propose a composite likelihood approach with penalized function to handle the model selection issue. In practice, we often face the model selection problem not only from choosing proper covariates for regression predictor, but also from the component of random effects. Furthermore, the specification of random effects distribution could be crucial to maintain the validity of statistical inference. Thus, the discussion on selecting both covariates and random effects as well as misspecification of random effects are also included in Chapter 4. Chapter 5 of this thesis mainly addresses the joint features of missingness and model selection. I propose a specific composite likelihood method to handle this issue. A typical advantage of the approach is that the inference procedure does not involve explicit missing process assumptions and nuisance parameters estimation.