Causal Inference with Covariate Balance Optimization

Xie, Yuying

Causal Inference with Covariate Balance Optimization

Files

Xie_Yuying.pdf (1.1 MB)

Date

2018-12-04

Authors

Xie, Yuying

Advisor

Zhu, Yeying
Cotton, Cecilia

Publisher

University of Waterloo

Abstract

Causal inference is a popular problem in biostatistics, economics, and health science studies. The goal of this thesis is to develop new methods for the estimation of causal effects using propensity scores or inverse probability weights where weights are chosen in such a way to achieve balance in covariates across the treatment groups. In Chapter 1, we introduce Neyman-Rubin Causal framework and causal inference with propensity scores. The importance of covariate balancing in causal inference is furthered discussed in this chapter. Besides, some general definitions and notations for causal inference are provided with many other popular propensity score approaches or weighting techniques in Chapter 2. In Chapter 3, we describe a new model averaging approach to propensity score estimation in which parametric and nonparametric estimates are combined to achieve covariate balance. Simulation studies are conducted across different scenarios varying in the degree of interactions and nonlinearity in the treatment model. The results show that the proposed method produces less bias and smaller standard errors than existing approaches. They also show that a model averaging approach with the objective of minimizing the average Kolmogorov-Smirnov statistic leads to the best performance. The proposed approach is applied to a real data set in evaluating the causal effect of formula or mixed feeding versus exclusive breastfeeding in the first month of life on a child's BMI Z-score at age 4. The data analysis shows that formula or mixed feeding is more likely to lead to obesity at age 4, compared to exclusive breastfeeding. In Chapter 4, we propose using kernel distance to measure balance across different treatment groups and propose a new propensity score estimator by setting the kernel distance to be zero. Compared to other balance measures, such as absolute standardized mean difference (ASMD) and Kolmogorov Smirnov (KS) statistic, kernel distance is one of the best bias indicators in estimating the causal effect. That is, the balance metric based on kernel distance is shown to have the strongest correlation with the absolute bias in estimating the causal effect, compared to several commonly used balance metrics. The kernel distance constraints are solved by generalized method of moments. Simulation studies are conducted across different scenarios varying in the degree of nonlinearity in both the propensity score model and outcome model. The proposed approach produces smaller mean squared error in estimating causal treatment effects than many existing approaches including the well-known covariate balance propensity score (CBPS) approach when the propensity score model is misspecified. An application to data from the International Tobacco Control (ITC) policy evaluation project is provided. Often interest lies in the estimation of quantiles other than the average causal effect. Other quantities such as quantiles or the quantile treatment effect may be of interest. In Chapter 5, we propose a multiply robust method for estimating marginal quantiles of potential outcomes by achieving mean balance in (1) the propensity score, and (2) the conditional distributions of potential outcomes. An empirical likelihood or entropy measure can be utilized instead of using inverse probability weighting. Simulation studies are conducted across different scenarios of correctness in both the propensity score models and outcome models. Our estimator is consistent if any of the models are correctly specified.