Empirical Likelihood Methods for Some Incomplete Data Problems

Che, Menglu

Empirical Likelihood Methods for Some Incomplete Data Problems

dc.contributor.advisor	Han, Peisong
dc.contributor.advisor	Lawless, Jerald
dc.contributor.author	Che, Menglu
dc.date.accessioned	2020-12-18T20:29:41Z
dc.date.available	2020-12-18T20:29:41Z
dc.date.issued	2020-12-18
dc.date.submitted	2020-12-17
dc.description.abstract	Incomplete data often brings difficulty to estimations and inferences. A complete case (CC) analysis, in most cases, leads to biased estimates, or it may not have the desired estimation efficiency. In this thesis, we develop statistical methods addressing the estimation of regression parameters with missing covariates. We are interested in improving the estimation efficiency by incorporating the information from the partially observed cases. Chapter 1 is an introduction to incomplete data problems and some existing estimation frameworks. We present the major tool we utilize to improve the estimation efficiency, i.e., empirical likelihood for general estimating functions. A brief introduction to the problems we solve in the subsequent chapters is also provided. Chapter 2 considers a regression problem with covariates missing not at random, where the missingness depends on the missing covariate values. For this type of missingness, CC analysis leads to consistent estimation when the missingness is independent of the response given all covariates, but it may not have the desired level of efficiency. We propose a general empirical likelihood framework to improve the estimation efficiency upon CC analysis. We expand on methods in Bartlett, Carpenter, Tilling & Vansteelandt (2014) and Xie & Zhang (2017) Instead of improving the efficiency by modelling the missingness probability conditional on the response and fully observed covariates, our method allows the possibility of modelling other data distribution-related quantities. We also give guidelines on what quantities to model and demonstrate that our proposal has the potential to yield smaller biases than existing methods when the missingness probability model is incorrect. Simulation studies are presented, as well as an application to data collected from the US National Health and Nutrition Examination Survey. Chapters 3 and 4 concern another type of incomplete data, namely the two-phase, response-dependent or outcome-dependent sample. This type of sampling is often used in regression settings that involve expensive covariate measurements. Conditional maximum likelihood (CML) is an attractive approach in many cases as it avoids modelling the covariate distribution, unlike full maximum likelihood. Moreover, it handles zero selection probabilities of the Phase 2 sampling. In Chapter 3, we consider general regression models with either a discrete or continuous response. We show that the estimator of covariate effects proposed by Scott & Wild (2011) has the same asymptotic efficiency as two empirical likelihood estimators, and that these estimators dominate the CML estimator. Chapter 4 proposes a more general empirical likelihood method within the CML framework to incorporate the information in the Phase 1 sample and improve estimation efficiency. The proposed method exploits a model which only involves the fully observed variates. It maintains the ability to handle zero selection probability and avoids modelling the covariate distribution. The proposed methods exhibit improvement upon CML as well as the estimator by Scott & Wild (2011) considered in Chapter 3. In these two chapters, we compare the efficiencies of various estimators in simulation studies and illustrate the methodologies in a two-phase genetics study. Chapter 5 presents some additional discussion and some topics for future research. We summarize the key points in our framework utilizing auxiliary information to improve estimation efficiency. Some additional remarks are given on the issues of numerical implementation, model diagnosis, and model compatibility. Finally, we discuss some topics for future research that are related to the methods considered in the thesis.	en
dc.identifier.uri	http://hdl.handle.net/10012/16578
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	empirical likelihood	en
dc.subject	missing data	en
dc.subject	estimating equations	en
dc.subject	two-phase samples	en
dc.title	Empirical Likelihood Methods for Some Incomplete Data Problems	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.degree.department	Statistics and Actuarial Science	en
uws-etd.degree.discipline	Statistics	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Han, Peisong
uws.contributor.advisor	Lawless, Jerald
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Che_Menglu.pdf
Size:: 666.95 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Statistics and Actuarial Science