An Investigation of Methods for Missing Data in Hierarchical Models for Discrete Data

Ahmed, Muhamad Rashid

An Investigation of Methods for Missing Data in Hierarchical Models for Discrete Data

Files

Ahmed_Muhammad.pdf (1020.09 KB)

Date

2011-03-09T19:44:21Z

Authors

Ahmed, Muhamad Rashid

Publisher

University of Waterloo

Abstract

Hierarchical models are applicable to modeling data from complex surveys or longitudinal data when a clustered or multistage sample design is employed. The focus of this thesis is to investigate inference for discrete hierarchical models in the presence of missing data. This thesis is divided into two parts: in the first part, methods are developed to analyze the discrete and ordinal response data from hierarchical longitudinal studies. Several approximation methods have been developed to estimate the parameters for the fixed and random effects in the context of generalized linear models. The thesis focuses on two likelihood-based estimation procedures, the pseudo likelihood (PL) method and the adaptive Gaussian quadrature (AGQ) method. The simulation results suggest that AGQ is preferable to PL when the goal is to estimate the variance of the random intercept in a complex hierarchical model. AGQ provides smaller biases for the estimate of the variance of the random intercept. Furthermore, it permits greater flexibility in accommodating user-defined likelihood functions. In the second part, simulated data are used to develop a method for modeling longitudinal binary data when non-response depends on unobserved responses. This simulation study modeled three-level discrete hierarchical data with 30% and 40% missing data using a missing not at random (MNAR) missing-data mechanism. It focused on a monotone missing data-pattern. The imputation methods used in this thesis are: complete case analysis (CCA), last observation carried forward (LOCF), available case missing value (ACMVPM) restriction, complete case missing value (CCMVPM) restriction, neighboring case missing value (NCMVPM) restriction, selection model with predictive mean matching method (SMPM), and Bayesian pattern mixture model. All three restriction methods and the selection model used the predictive mean matching method to impute missing data. Multiple imputation is used to impute the missing values. These m imputed values for each missing data produce m complete datasets. Each dataset is analyzed and the parameters are estimated. The results from the m analyses are then combined using the method of Rubin(1987), and inferences are made from these results. Our results suggest that restriction methods provide results that are superior to those of other methods. The selection model provides smaller biases than the LOCF methods but as the proportion of missing data increases the selection model is not better than LOCF. Among the three restriction methods the ACMVPM method performs best. The proposed method provides an alternative to standard selection and pattern-mixture modeling frameworks when data are not missing at random. This method is applied to data from the third Waterloo Smoking Project, a seven-year smoking prevention study having substantial non-response due to loss-to-follow-up.