Causal Inference in the Presence of Heterogeneous Treatment Effects

Liang, Wei

Causal Inference in the Presence of Heterogeneous Treatment Effects

Files

Liang_Wei.pdf (1.06 MB)

Date

2025-07-07

Authors

Liang, Wei

Advisor

Wu, Changbao

Publisher

University of Waterloo

Abstract

Causal inference has been widely accepted as a statistical tool in various areas for demystifying causality from data. Treatment effect heterogeneity is a common issue in causal inference which refers to variation in the causal effect of a treatment across different subgroups or individuals within a population. This thesis explores three topics in causal inference in the presence of heterogeneous treatment effects, aiming to provide some insights for this critical issue. Chapter 2 introduces basic notation, frameworks, models, and parameters in causal inference, serving as preliminary material for the three topics studied in Chapters 3 - 5, with a focus on the Rubin causal model. In Chapter 3, we discuss the first topic: causal inference with survey data. In the presence of heterogeneous treatment effects, a causal conclusion based on sample data may not generalize to a broader population if selection bias exists. We propose estimators for population average treatment effects by incorporating survey weights into the propensity score weighting approach to simultaneously mitigate confounding bias and selection bias. A robust sandwich variance estimator is developed to permit valid statistical inference for the population-level causal parameters under a proposed "two-phase randomization model" framework. The proposed estimators and associated inferential procedure are shown to be robust against model misspecifications. We further extend our results to observational non-probability survey samples and demonstrate how to combine auxiliary population in- formation from multiple external reference probability samples for more reliable estimation. We illustrate our proposed methods through Monte Carlo simulation studies and the analysis of a real-world survey dataset. Chapter 4 explores the second topic: estimation of treatment harm rate (THR), the proportion of individuals in a population who are negatively affected by a treatment. The THR is a measure of treatment risk and reveals the treatment effect heterogeneity within a subpopulation. However, the measure is generally non-identifiable even when the treatments are randomly assigned, and existing works focus primarily on the estimation of the THR under either untestable identification or ambiguous model assumptions. We develop a class of partitioning-based bounds for the THR with data from randomized controlled trials with two distinct features: Our proposed bounds effectively use available auxiliary covariates information and they can be consistently estimated without relying on any untestable or ambiguous model assumptions. Our methods are motivated from a key observation that the sharp bounds of the THR can be attained under a partition of the covariates space with at most four cells. Probabilistic classification algorithms are employed to estimate nuisance parameters to realize the partitioning. The resulting interval estimators of the THR are model-assisted in the sense that they are highly efficient when the underlying models are well fitted, while their validity relies solely on the randomization of the trials. Finite sample performances of our proposed interval estimators along with a conservatively extended confidence interval for the THR are evaluated through Monte Carlo simulation studies. An application of the proposed methods to the ACTG 175 data is presented. A Python package named partbte for the partitioning-based algorithm has been developed and is available on https://github.com/w62liang/partition-te. Chapter 5 investigates the third topic: causal mediation analysis in randomized controlled trials with noncompliance. The average causal mediation effect (ACME) and the natural direct effect (NDE) are two parameters of primary interest in causal mediation analysis. However, the two causal parameters are not identifiable in randomized controlled trials in the presence of mediator-outcome confounding and assignment-treatment noncompliance. In such scenarios, we explore partial identification of parameters and derive nonparametric bounds on the ACME and the NDE when the treatment assignment serves as an instrumental variable. The nonparametric sharp bounds for the local causal parameters defined on the subpopulation of treatment-assignment compliers are also provided. We demonstrate the practical application of the proposed bounds through an empirical analysis of a large-scale randomized online advertising dataset. The thesis concludes in Chapter 6 with a brief summary and discussions of future work. Technical details, including the proofs of key propositions and theorems as well as additional simulation results, are provided at the end of each chapter.

Keywords

causal inference, treatment effect heterogeneity, survey sampling, selection bias, partial identification

URI

https://hdl.handle.net/10012/21972

Collections

Theses
Statistics and Actuarial Science

Full item page

Causal Inference in the Presence of Heterogeneous Treatment Effects

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections