Marginal Causal Sub-Group Analysis with Incomplete Covariate Data

Cuerden, Meaghan

Marginal Causal Sub-Group Analysis with Incomplete Covariate Data

Files

Cuerden_Meaghan.pdf (1.05 MB)

Date

2019-01-11

Authors

Cuerden, Meaghan

Advisor

Cook, Richard
Cotton, Cecilia
Diao, Liqun

Publisher

University of Waterloo

Abstract

Incomplete data arises frequently in health research studies designed to investigate the causal relationship between a treatment or exposure, and a response of interest. Statistical methods for conditional causal effect parameters in the setting of incomplete data have been developed, and we expand upon these methods for estimating marginal causal effect parameters. This thesis focuses on the estimation of marginal causal odds ratios, which are distinct from conditional causal odds ratios in logistic regression models; marginal causal odds ratios are frequently of interest in population studies. We introduce three methods for estimating the marginal causal odds ratio of a binary response for different levels of a subgroup variable, where the subgroup variable is incomplete. In each chapter, the subgroup variable, exposure variable and the response variable are binary and the subgroup variable is missing at random. In Chapter 2, we begin with an overview of inverse probability weighted methods for confounding in an observational setting where data are complete. We also briefly review methods to deal with incomplete data in a randomized setting. We then introduce a doubly inverse probability weighted estimating equation approach to estimate marginal causal odds ratios in an observational setting, where an important subgroup variable is incomplete. One inverse probability weight accounts for the incomplete data, and the other weight accounts for treatment selection. Only complete cases are included in the response model. Consistency results are derived, and a method to obtain estimates of the asymptotic standard error is introduced; the extra variability introduced by estimating two weights is incorporated in the estimation of the asymptotic standard error. We give a method for hypothesis testing and calculation of confidence intervals. Simulation studies show that the doubly weighted estimating equation approach is effective in a non-ignorable missingness setting with confounding, and it is straightforward to implement. It also performs well when the missing data process is ignorable, and/or when confounding is not present. In Chapter 3, we begin with an overview of an EM algorithm approach for estimating conditional causal effect parameters in the setting of incomplete covariate data, in both randomized and observational settings. We then propose the use of a doubly weighted EM-type algorithm approach to estimate the marginal causal odds ratio in the setting of missing subgroup data. In this method, instead of using complete case analysis in the response model, all available data is used and the incomplete subgroup variable is “filled in” using a maximum likelihood approach. Two inverse probability weights are used here as well, to account for confounding and incomplete data. The weight which accounts for the incomplete data is needed, even though an EM approach is being used, because the marginal causal odds ratio is of interest. A method to obtain asymptotic standard error estimates is given where the extra variability introduced by estimating the two inverse probability weights, as well as the variability introduced by estimating the conditional expectation of the incomplete subgroup variable, is incorporated. Simulation studies show that this method is effective in terms of obtaining consistent estimates of the parameters of interest; however it is difficult to implement, and in certain settings there is a loss of efficiency in comparison to the methods introduced in Chapter 2. In Chapter 4, we begin by reviewing multiple imputation methods in randomized and observational settings, where estimation of the conditional causal odds ratio is of interest. We then propose the use of multiple imputation with one inverse probability weight to account for confounding in an observational setting where the subgroup variable is incomplete. We discuss methods to correctly specify the imputation model in the setting where the conditional causal odds ratio is of interest, as well as in the setting where the marginal causal odds ratio is of interest. We use standard methods for combining the estimates of the marginal log odds ratios from each imputed dataset. We propose a method for estimating the asymptotic standard error of the estimates, which incorporates both the estimation of the parameters in the weight for confounding, and the multiply imputed datasets. We give a method for hypothesis testing and calculation of confidence intervals. Simulation studies show that this method is efficient and straightforward to implement, but correct specification of the imputation model is necessary. In Chapter 5, the three methods that have been introduced are used in an application to an observational cohort study of 418 colorectal cancer patients. We compare patients who received an experimental chemotherapy with patients who received standard chemotherapy; of interest is estimation of the marginal causal odds ratio of a thrombotic event during the course of treatment or 30 days after treatment is discontinued. The important subgroups are (i) patients receiving first line of treatment, and (ii) patients receiving second line of treatment. In Chapter 6, we compare and contrast the three methods proposed. We also discuss extensions to different response models, models for missing response data, and weighted models in the longitudinal data setting.