Fractional Imputation for Ordinal and Mixed-type Responses with Missing Observations

Loading...
Thumbnail Image

Date

2017-01-12

Authors

She, Xichen

Advisor

Wu, Changbao

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

This thesis addresses two essential aspects of large scale public-use data files involving ordinal and mixed-type responses with missing observations: (i) the creation of single complete data sets with imputation for missing values; and (ii) the statistical analysis of imputed data sets by public data users with different objectives. Large scale data sets are typically collected by statistical agencies, research institutes or commercial organizations and missing observations are a common feature. Our research focuses on scenarios where one ordinal response or several mixed-type responses are part of the data sets and are subject to missingness. We develop a sequential regression fractional imputation procedure to create single complete data sets which provide valid and efficient statistical analysis for commonly encountered inferential problems by public data users. Ordinal variables are widely collected and analyzed in many scientific fields. They share some common tools with discrete data analysis but have much richer structure to explore as compared to general categorical variables. More importantly, statistical methods developed for ordinal variables can be readily extended to cover categorical data. In this thesis, we present the sequential regression fractional imputation strategy through three major research projects, starting from ordinal variables and extending to mixed-type responses. The proposed method takes into account unique features of ordinal responses and is theoretically sound and practically appealing. The first project considers a simple scenario where there is only one ordinal response with missing values. We provide detailed steps for the proposed imputation procedure and develop asymptotic properties of subsequent estimators derived under a general setting. We discuss in great detail three inferential problems of practical importance: (1) estimation of category probabilities; (2) regression analysis using all available covariates; and (3) regression analysis involving a subset of all the covariates. For each problem, the proposed procedure is compared with existing alternative methods in terms of validity and efficiency of the analysis. Finite sample performances are demonstrated through simulation studies. The second research project extends the proposed procedure to more complex scenarios where multiple variables of mixed types, including continuous, ordered and unordered categorical variables, all contain missing observations. We outline the key steps for the sequential regression fractional imputation procedure under general settings and present asymptotic results on statistical analysis through two specific inferential problems: (1) test of independence for two ordinal responses via association measures; and (2) regression of an ordinal response on continuous covariates where both the response and the covariates are subject to missingness. Simulation studies reveal that our proposed procedure provides superior results as compared to existing methods. In the third research project, we study the robustness of the estimators for marginal population quantities by incorporating missing data mechanisms into the proposed procedure. Two cases are considered: one of a univariate ordinal response with missing values and the other of longitudinal ordinal responses with monotone missingness. We show the power of the proposed procedure through an application to a causal inference problem in a point-treatment study. The double robustness property of the estimators for marginal population quantities using the fractionally imputed data sets against misspecification of the imputation models as well as the response probability models is confirmed through results from simulation studies.

Description

Keywords

public-use data, missing data, fractional imputation, ordinal responses, double robustness

LC Subject Headings

Citation