UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Statistical Analysis with Non-probability Survey Samples

Loading...
Thumbnail Image

Date

2020-09-25

Authors

Chen, Yilin

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

The goal of this thesis is to develop inferential procedures with non-probability survey samples. In recent years, the use of non-probability survey samples has become one of the most important topics in the area. Contrast to the burdensome process of obtaining probability samples, non-probability survey samples, empowered by the information technology, can be acquired through the internet and other convenient measures timely and efficiently. These prompt and affordable data have facilitated online researches for both academic and industrial uses. Nevertheless, non-probability survey samples are biased samples, from which no valid inferences about the target population can be obtained immediately. A popular tool for bias correction is the propensity score associated with each unit in the population, which is defined as the probability of selection conditional on observed auxiliary variables. Propensity scores need to be estimated in practice, but existing estimation methods are mainly derived on an ad hoc basis. This thesis establishes a general framework for statistical inferences with non-probability survey samples when relevant auxiliary information is available from a reference probability survey sample. Under this framework, we develop a rigorous procedure of estimating propensity scores. The main idea of the procedure is to approximate the required but unknown population-level information by its estimate based on the reference sample. Given the estimated propensity scores, we further present two parallel approaches to estimate the finite population mean: the quasi-randomization (QR) approach and the pseudo-empirical likelihood (PEL) approach. Moreover, the potential issue of zero propensity scores is highlighted and investigated.

Description

Keywords

non-probability sample, survey sampling, data merging, administrative data, multiple datasets, doubly robust estimation

LC Keywords

Citation