Statistical Analysis with Non-probability Survey Samples
Loading...
Date
2020-09-25
Authors
Chen, Yilin
Advisor
Wu, Changbao
Li, Pengfei
Li, Pengfei
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
The goal of this thesis is to develop inferential procedures with non-probability survey samples. In recent years, the use of non-probability survey samples has become one of the most important topics in the area. Contrast to the burdensome process of obtaining probability samples, non-probability survey samples, empowered by the information technology, can be acquired through the internet and other convenient measures timely and efficiently. These prompt and affordable data have facilitated online researches for both academic and industrial uses.
Nevertheless, non-probability survey samples are biased samples, from which no valid inferences about the target population can be obtained immediately. A popular tool for bias correction is the propensity score associated with each unit in the population, which is defined as the probability of selection conditional on observed auxiliary variables. Propensity scores need to be estimated in practice, but existing estimation methods are mainly derived on an ad hoc basis. This thesis establishes a general framework for statistical inferences with non-probability survey samples when relevant auxiliary information is available from a reference probability survey sample. Under this framework, we develop a rigorous procedure of estimating propensity scores. The main idea of the procedure is to approximate the required but unknown population-level information by its estimate based on the reference sample. Given the estimated propensity scores, we further present two parallel approaches to estimate the finite population mean: the quasi-randomization (QR) approach and the pseudo-empirical likelihood (PEL) approach. Moreover, the potential issue of zero propensity scores is highlighted and investigated.
Description
Keywords
non-probability sample, survey sampling, data merging, administrative data, multiple datasets, doubly robust estimation