Causal Inference with Measurement Error

Shu, Di

Causal Inference with Measurement Error

Files

Shu_Di.pdf (1.42 MB)

Date

2018-05-04

Authors

Shu, Di

Advisor

Yi, Grace Y.

Publisher

University of Waterloo

Abstract

Causal inference methods have been widely used in biomedical sciences and social sciences, among many others. With different assumptions, various methods have been proposed to conduct causal inference with interpretable results. The validity of most existing methods, if not all, relies on a crucial condition: all the variables need to be precisely measured. This condition, however, is commonly violated. In many applications, the collected data are not precisely measured and are subject to measurement error. Ignoring measurement error effects can lead to severely biased results and misleading conclusions. In order to obtain reliable inference results, measurement error effects should be carefully addressed. Outside the context of causal inference, research on measurement error problems has been extensive and a large body of methods have been developed. In the paradigm of causal inference, however, there is limited research on measurement error problems, although an increasing, but still scarce, literature has emerged. Certainly, this is an area that deserves in-depth investigation. Motivated by this, this thesis focuses on causal inference with measurement error. We investigate the impact of measurement error and propose methods which correct for measurement error effects for several useful settings. This thesis consists of nine chapters. As a preliminary, Chapter 1 gives an introduction to causal inference, measurement error and other features such as missing data, as well as an overview of existing methods on causal inference with measurement error. In this chapter we also describe the problems of our interest that will be investigated in depth in subsequent chapters. Chapter 2 considers estimation of the causal odds ratio, the causal risk ratio and the causal risk difference in the presence of measurement error in confounders, possibly time-varying. By adapting two correction methods for measurement error effects applicable for the noncausal context, we propose valid methods which consistently estimate the causal effect measures for settings with error-prone confounders. Furthermore, we develop a linear combination based method to construct estimators with improved asymptotic efficiency. Chapter 3 focuses on the inverse-probability-of-treatment weighted (IPTW) estimation of causal parameters under marginal structural models with error-contaminated and time-varying confounders. To account for bias due to imprecise measurements, we develop several correction methods. Both the so-called stabilized and unstabilized weighting strategies are covered in the development. In Chapter 4, measurement error in outcomes is of concern. For settings of inverse probability weighting (IPW) estimation, we study the impact of measurement error for both continuous and binary outcome variables and reveal interesting consequences of the naive analysis which ignores measurement error. When a continuous outcome variable is mismeasured under an additive measurement error model, the naive analysis may still yield a consistent estimator; when the outcome is binary, we derive the asymptotic bias in a closed-form. Furthermore, we develop consistent estimation procedures for practical scenarios where either validation data or replicates are available. With validation data, we propose an efficient method. To provide protection against model misspecification, we further develop a doubly robust estimator which is consistent even when one of the treatment model and the outcome model is misspecified. In Chapter 5, the research problem of interest is to deal with measurement error generated from more than one sources. We study the IPW estimation for settings with mismeasured covariates and misclassified outcomes. To correct for measurement error and misclassification effects simultaneously, we develop two estimation methods to facilitate different forms of the treatment model. Our discussion covers a broad scope of treatment models including typically assumed logistic regression models as well as general treatment assignment mechanisms. Chapters 2-5 emphasize addressing measurement error effects on causal inference. In applications, we may be further challenged by additional data features. For instance, missing values frequently occur in the data collection process in addition to measurement error. In Chapter 6, we investigate the problem for which both missingness and misclassification may be present in the binary outcome variable. We particularly consider the IPW estimation and derive the asymptotic biases of three types of naive analysis which ignore either missingness or misclassification or both. We develop valid estimation methods to correct for missingness and misclassification effects simultaneously. To provide protection against misspecification, we further propose a doubly robust correction method. Doubly robust estimators developed in Chapter 6 offer us a viable way to address issues of model misspecification and they can be easily applied for practical problems. However, such an appealing property does not say that doubly robust estimators have no weakness. When both the treatment model and the outcome model are misspecified, such estimators will not necessarily be consistent. Driven by this consideration, in Chapter 7, we propose new estimation methods to correct for effects of misclassification and/or missingness in outcomes. Differing from the doubly robust estimators which are constructed based on a single treatment model and a single outcome model, the new methods are developed by considering a set of treatment models and a set of outcome models. Such enlargements of the associated models enable us to construct consistent estimators which will enjoy the so-called multiple robustness, a property that has been discussed in the literature of missing data. To expedite the application of our developed methods, we implement the proposed methods in Chapter 4 and develop an R package for general users. The details are included in Chapter 8. The thesis concludes with a discussion in Chapter 9.