Analysis of Longitudinal Surveys with Missing Responses
Carrillo Garcia, Ivan Adolfo
MetadataShow full item record
Longitudinal surveys have emerged in recent years as an important data collection tool for population studies where the primary interest is to examine population changes over time at the individual level. The National Longitudinal Survey of Children and Youth (NLSCY), a large scale survey with a complex sampling design and conducted by Statistics Canada, follows a large group of children and youth over time and collects measurement on various indicators related to their educational, behavioral and psychological development. One of the major objectives of the study is to explore how such development is related to or affected by familial, environmental and economical factors. The generalized estimating equation approach, sometimes better known as the GEE method, is the most popular statistical inference tool for longitudinal studies. The vast majority of existing literature on the GEE method, however, uses the method for non-survey settings; and issues related to complex sampling designs are ignored. This thesis develops methods for the analysis of longitudinal surveys when the response variable contains missing values. Our methods are built within the GEE framework, with a major focus on using the GEE method when missing responses are handled through hot-deck imputation. We first argue why, and further show how, the survey weights can be incorporated into the so-called Pseudo GEE method under a joint randomization framework. The consistency of the resulting Pseudo GEE estimators with complete responses is established under the proposed framework. The main focus of this research is to extend the proposed pseudo GEE method to cover cases where the missing responses are imputed through the hot-deck method. Both weighted and unweighted hot-deck imputation procedures are considered. The consistency of the pseudo GEE estimators under imputation for missing responses is established for both procedures. Linearization variance estimators are developed for the pseudo GEE estimators under the assumption that the finite population sampling fraction is small or negligible, a scenario often held for large scale population surveys. Finite sample performances of the proposed estimators are investigated through an extensive simulation study. The results show that the pseudo GEE estimators and the linearization variance estimators perform well under several sampling designs and for both continuous response and binary response.