Statistical Methods for Mitigating Bias from Confounding and Measurement Error with Complex Exposures

Wang, Xiaoya

Statistical Methods for Mitigating Bias from Confounding and Measurement Error with Complex Exposures

Files

Wang_Xiaoya.pdf (2.14 MB)

Date

2026-04-20

Authors

Wang, Xiaoya

Advisor

Cook, Richard J.
Zhu, Yeying

Publisher

University of Waterloo

Abstract

This thesis is concerned with methods for mitigating bias from confounding and measurement error with a semi-continuous exposure. This work is primarily motivated by the analysis of six longitudinal cohort studies investigating the effect of prenatal alcohol exposure (PAE) on childhood cognition. Prenatal alcohol exposure is reported as the average number of ounces of alcohol consumed each week during the pregnancy; this is a semi-continuous variable with a point mass at zero, a value held for expectant mothers who do not consume alcohol during their pregnancy. Throughout the following chapters, we develop novel approaches to estimate causal effects for semi-continuous exposures of this sort and propose new strategies for addressing measurement error and misclassification. These methods are designed to enhance the validity, accuracy, and reliability of causal estimates in epidemiological studies and other applications involving semi-continuous exposures. Chapter 1 introduces the potential outcomes framework for causal inference and reviews key approaches for addressing confounding, including propensity score methods and related estimation strategies. It also summarizes core concepts in measurement error and misclassification, and introduces the motivating study. In Chapter 2, we extend methods for causal inference with binary treatment indicators to handle semi-continuous exposure variables. The exposure distribution is semi-continuous with a mass at zero (representing the unexposed sub-population) and a sub-density characterizing variation in the level of exposure among those exposed. We first propose the potential outcomes framework for a setting with a semi-continuous exposure, then develop a two-stage estimation. In the first stage, the causal effect of the exposure level is assessed among exposed individuals using propensity score regression adjustment. In the second stage, the causal effect of the binary ``exposure status" is evaluated using inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimation functions. We derive the large sample properties of the estimators resulting from the various methods of analyses and construct joint confidence regions for the causal effects. Simulation studies confirm good finite sample performance of the proposed estimators. We apply these new approaches to analyze data from the Detroit prenatal alcohol study. In Chapter 3, we address the challenge of causal inference regarding drinking status and the effect of dose for multiple outcomes representing domains of cognitive functions. A two-stage estimating equation approach is proposed for multiple outcomes with large sample properties derived for the resulting estimators. Homogeneity tests are developed to assess whether causal effects of exposure status and the dose-response effects are the same across multiple outcomes. A global homogeneity test is also developed to assess whether the effect of exposure status (exposed/not exposed) and the dose-response effect of the continuous exposure level are each equal across all domains. The methods of estimation and testing are rigorously evaluated in simulation studies and applied to a motivating study on the effects of prenatal alcohol exposure on childhood cognition defined by executive function (EF), academic achievement in math, and learning and memory (LM). In Chapter 4, we develop likelihood-based methods to correct for the effect of a semi-continuous exposure subject to both misclassification of exposure status and measurement error in the exposure level. Motivated by repeated maternal self-reports of alcohol use collected during pregnancy, we specify a two-part measurement error model in which the binary indicator of any exposure may be misclassified and the log-transformed dose among the exposed is measured with error. Treating the true exposure components as latent, we derive two estimation strategies: a two-stage approach that estimates the exposure error process using the replicate data and then corrects the outcome model, and a joint approach that simultaneously estimates all model components using an EM algorithm. We establish the large sample inference of the estimators, extend the framework to multi-cohort studies with formal homogeneity tests to guide evidence synthesis across cohorts, and evaluate performance in simulation studies. The proposed methods are illustrated using data from two Pittsburgh prenatal alcohol cohorts, yielding corrected effect estimates and tests that inform whether pooling across cohorts is appropriate. Finally, Chapter 5 summarizes the contributions of this thesis and outlines directions for future research.