Empirical Likelihood Methods for Causal Inference

Loading...
Thumbnail Image

Date

2024-08-21

Advisor

Wu, Changbao
Zeng, Leilei

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

This thesis develops empirical likelihood methods for causal inference, focusing on the estimation and inference of the average treatment effect (ATE) and the causal quantile treatment effect (QTE). Causal inference has been a critical research area for decades, as it is essential for understanding the true impact of interventions, policies, or actions, thereby enabling informed decision-making and providing insights into the mechanisms shaping our world. However, directly comparing responses between treatment and control groups can yield invalid results due to potential confounders in treatment assignments. In Chapter 1, we introduce fundamental concepts in causal inference under the widely adopted potential outcome framework and discuss the challenges in observational studies. We formulate our research problems concerning the estimation and inference of the ATE and review some commonly used methods for ATE estimation. Chapter 2 provides a brief review of traditional empirical likelihood methods, followed by the pseudo-empirical likelihood (PEL) and sample empirical likelihood (SEL) approaches in survey sampling for one-sample problems. In Chapter 3, we propose two inferential procedures for the ATE using a two-sample PEL approach. The first procedure employs estimated propensity scores for the formulation of the PEL function, resulting in a maximum PEL estimator of the ATE equivalent to the inverse probability weighted estimator discussed in the literature. Our focus in this scenario is on the PEL ratio statistic and establishing its theoretical properties. The second procedure incorporates outcome regression models for PEL inference through model-calibration constraints, and the resulting maximum PEL estimator of the ATE is doubly robust. Our main theoretical result in this case is the establishment of the asymptotic distribution of the PEL ratio statistic. We also propose a bootstrap method for constructing PEL ratio confidence intervals for the ATE to bypass the scaling constant which is involved in the asymptotic distribution of the PEL ratio statistic but is very difficult to calculate. Finite sample performances of our proposed methods with comparisons to existing ones are investigated through simulation studies. A real data analysis to examine the ATE of maternal smoking during pregnancy on birth weights using our proposed methods is also presented. In Chapter 4, we explore two SEL-based approaches for the estimation and inference of the ATE. Both involve a traditional two-sample empirical likelihood function with different ways of incorporating propensity scores. The first approach introduces propensity scores-calibrated constraints alongside the standard model-calibration constraints, while the second approach uses propensity scores to form weighted versions of the model-calibration constraints. Both approaches result in doubly robust estimators, and we derive the limiting distributions of the two SEL ratio statistics to facilitate the construction of confidence intervals and hypothesis tests for the ATE. Bootstrap methods for constructing SEL ratio confidence intervals are also discussed for both approaches. We investigate finite sample performances of the methods through simulation studies. While inferences on the ATE are an important problem with many practical applications, analyzing the QTE is equally important as it reveals intervention impacts across different population segments. In Chapter 5, we extend the PEL and the two SEL approaches from Chapters 3 and 4, each augmented with model-calibration constraints, to develop doubly robust estimators for the QTE. Two types of model-calibration constraints are proposed: one leveraging multiple imputations of potential outcomes and the other employing direct modeling of indicator functions. We calculate two types of bootstrap-calibrated confidence intervals for each of the six formulations, using point estimators and empirical likelihood ratios, respectively. We also discuss computational challenges and present simulation results. Our proposed approaches support the integration of multiple working models, facilitating the development of multiply robust estimators, distinguishing our methods from existing approaches. Chapter 6 summarizes the contributions of this thesis and outlines some research topics for future work.

Description

Keywords

statistics

LC Subject Headings

Citation