Empirical Likelihood Methods for Pretest-Posttest Studies
Pretest-posttest trials are an important and popular method to assess treatment effects in many scientific fields. In a pretest-posttest study, subjects are randomized into two groups: treatment and control. Before the randomization, the pretest responses and other baseline covariates are recorded. After the randomization and a period of study time, the posttest responses are recorded. Existing methods for analyzing the treatment effect in pretest-posttest designs include the two-sample t-test using only the posttest responses, the paired t-test using the difference of the posttest and the pretest responses, and the analysis of covariance method which assumes a linear model between the posttest and the pretest responses. These methods are summarized and compared by Yang and Tsiatis (2001) under a general semiparametric model which only assumes that the first and second moments of the baseline and the follow-up response variable exist and are finite. Leon et al. (2003) considered a semiparametric model based on counterfactuals, and applied the theory of missing data and causal inference to develop a class of consistent estimator on the treatment effect and identified the most efficient one in the class. Huang et al. (2008) proposed a semiparametric estimation procedure based on empirical likelihood (EL) which incorporates the pretest responses as well as baseline covariates to improve the efficiency. The EL approach proposed by Huang et al. (2008) (the HQF method), however, dealt with the mean responses of the control group and the treatment group separately, and the confidence intervals were constructed through a bootstrap procedure on the conventional normalized Z-statistic. In this thesis, we first explore alternative EL formulations that directly involve the parameter of interest, i.e., the difference of the mean responses between the treatment group and the control group, using an approach similar to Wu and Yan (2012). Pretest responses and other baseline covariates are incorporated to impute the potential posttest responses. We consider the regression imputation as well as the non-parametric kernel imputation. We develop asymptotic distributions of the empirical likelihood ratio statistic that are shown to be scaled chi-squares. The results are used to construct confidence intervals and to conduct statistical hypothesis tests. We also derive the explicit asymptotic variance formula of the HQF estimator, and compare it to the asymptotic variance of the estimator based on our proposed method under several scenarios. We find that the estimator based on our proposed method is more efficient than the HQF estimator under a linear model without an intercept that links the posttest responses and the pretest responses. When there is an intercept, our proposed model is as efficient as the HQF method. When there is misspecification of the working models, our proposed method based on kernel imputation is most efficient. While the treatment effect is of primary interest for the analysis of pretest-posttest sample data, testing the difference of the two distribution functions for the treatment and the control groups is also an important problem. For two independent samples, the nonparametric Mann-Whitney test has been a standard tool for testing the difference of two distribution functions. Owen (2001) presented an EL formulation of the Mann-Whitney test but the computational procedures are heavy due to the use of a U-statistic in the constraints. We develop empirical likelihood based methods for the Mann-Whitney test to incorporate the two unique features of pretest-posttest studies: (i) the availability of baseline information for both groups; and (ii) the missing by design structure of the data. Our proposed methods combine the standard Mann-Whitney test with the empirical likelihood method of Huang, Qin and Follmann (2008), the imputation-based empirical likelihood method of Chen, Wu and Thompson (2014a), and the jackknife empirical likelihood (JEL) method of Jing, Yuan and Zhou (2009). The JEL method provides a major relief on computational burdens with the constrained maximization problems. We also develop bootstrap calibration methods for the proposed EL-based Mann-Whitney test when the corresponding EL ratio statistic does not have a standard asymptotic chi-square distribution. We conduct simulation studies to compare the finite sample performances of the proposed methods. Our results show that the Mann-Whitney test based on the Huang, Qin and Follmann estimators and the test based on the two-sample JEL method perform very well. In addition, incorporating the baseline information for the test makes the test more powerful. Finally, we consider the EL method for the pretest-posttest studies when the design and data collection involve complex surveys. We consider both stratification and inverse probability weighting via propensity scores to balance the distributions of the baseline covariates between two treatment groups. We use a pseudo empirical likelihood approach to make inference of the treatment effect. The proposed methods are illustrated through an application using data from the International Tobacco Control (ITC) Policy Evaluation Project Four Country (4C) Survey.