Causal Inference and Matrix Completion with Correlated Incomplete Data

Sun, Zhaohan

dc.contributor.author	Sun, Zhaohan
dc.date.accessioned	2023-01-19 19:21:02 (GMT)
dc.date.available	2023-01-19 19:21:02 (GMT)
dc.date.issued	2023-01-19
dc.date.submitted	2023-01-17
dc.identifier.uri	http://hdl.handle.net/10012/19083
dc.description.abstract	Missing data problems are frequently encountered in biomedical research, social sciences, and environmental studies. When data are missing completely at random, a complete-case analysis may be the easiest approach. However, when data are missing not completely at random, ignoring the missing values will result in biased estimators. There has been a lot of work in handling missing data in the last two decades, such as likelihood-based methods, imputation methods, and bayesian approaches. The so-called matrix completion algorithm is one of the imputation approaches that has been widely discussed in the missing data literature. However, in a longitudinal setting, limited efforts have been devoted to using covariate information to recover the outcome matrix via matrix completion, when the response is subject to missingness. In Chapter 1, the basic definition and concepts of different types of correlated data are introduced, and matrix completion algorithms as well as the semiparametric approaches are also introduced for handling missingness in the literature of correlated data analysis. The definition of robust estimation and interference in causal inference are also presented in this chapter. In Chapter 2 we consider the prediction of missing responses in a longitudinal dataset via matrix completion. We propose a fixed effects longitudinal low-rank model which incorporates both subject-specific and time-specific covariates. The missingness mechanism is allowed to be missing at random, and the inverse probability weighting approach is utilized to debias the traditional quadratic loss in the matrix completion literature. To solve the optimization problem, a two-step optimization algorithm is proposed which provides good statistical properties for the estimation of the fixed effects and the low-rank term. In the theoretical investigation, the non-asymptotic error bounds on the fixed effects and the low-rank term are presented. We illustrate the finite sample performance of the proposed algorithm via simulation studies and apply our method to both a Covid-19 and PM2.5 emissions dataset. In Chapter 3, we consider the partial interference setting, that is, the whole population can be partitioned into clusters where the outcome of each unit depends on the intervention on other units within the same cluster, but not on the units in different clusters. We also assume that the confounders are subject to nonignorable missingness. We propose three distinct consistent estimators for the direct, indirect, total, and overall effect of the intervention on the outcome, and derive the asymptotic results accordingly. A comprehensive simulation study is carried out as well to investigate the finite sample properties of the proposed estimators. We illustrate the proposed methods by analyzing the data collected from an Acid Rain Program, which was launched to reduce air pollution in the USA by encouraging the scrubber’s installation on power plants, where the records of some operating characteristics of the power generating facilities are subject to missingness. In Chapter 4, we focus on the estimation of network causal effects. Under the setting of nonignorable missing confounders, we develop a multiply robust estimation procedure that gains extra protection against model misspecification. Compared with doubly robust estimators proposed in Chapter 3, the proposed multiply robust estimators are consistent if either one pair of the propensity score of treatment and missingness mechanism, or the joint model of confounders and the outcome, is correctly specified. The finite performance of the proposed methods under different missingness rates and cluster sizes is investigated, and we further illustrate the proposed methods with the same real data used in Chapter 3. We conclude this thesis and discuss the future work in Chapter 5. Specifically, in Section 5.1, we summarize the contributions of the chapters in this thesis. In Section 5.2, we discuss the extension of Chapter 2 where the construction of confidence intervals for the low-rank term and the estimated fixed effects are investigated. Finally, in Section 5.3, we briefly discuss the potential extensions of Chapters 3 and 4 to a more general setting.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	Missing data	en
dc.subject	Causal inference	en
dc.title	Causal Inference and Matrix Completion with Correlated Incomplete Data	en
dc.type	Doctoral Thesis	en
dc.pending	false
uws-etd.degree.department	Statistics and Actuarial Science	en
uws-etd.degree.discipline	Statistics	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Zhu, Yeying
uws.contributor.advisor	Dubin, Joel
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Zhaohan_Sun.pdf
Size:: 3.734Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record