Semiparametric Empirical Likelihood Inference under Two-sample Density Ratio Models
The semiparametric density ratio model (DRM) provides a flexible and useful platform for combining information from multiple sources. It has been widely used in many fields. This thesis considers several important inference problems under two-sample DRMs. Chapter 1 serves as an introduction. We review the DRM, empirical likelihood, which is a useful inference tool under the DRM, and some applications of DRMs. We also outline the research problems that will be explored in the subsequent chapters. How to effectively use auxiliary information and data from multiple sources to enhance statistical inference is an important and active research topic in many fields. In Chapter 2, we consider statistical inference under two-sample DRMs with additional parameters, including the main parameters of interest, defined through and/or additional auxiliary information expressed as estimating equations. We examine the asymptotic properties of the maximum empirical likelihood estimators (MELEs) of the unknown parameters in the DRMs and/or defined through estimating equations, and establish the chi-square limiting distributions for the empirical likelihood ratio (ELR) statistics. We show that the asymptotic variance of the MELEs of the unknown parameters does not decrease if one estimating equation is dropped. Similar properties are obtained for inferences on the cumulative distribution function and quantiles of each of the populations involved. We also propose an ELR test for the validity and usefulness of the auxiliary information. Simulation studies show that correctly specified estimating equations for the auxiliary information result in more efficient estimators and shorter confidence intervals. Two real examples are used for illustrations. The Youden index is a popular summary statistic for receiver operating characteristic curves. It gives the optimal cutoff point of a biomarker to distinguish the diseased and healthy individuals. In Chapter 3, we model the distributions of a biomarker for indi- viduals in the healthy and diseased groups via a DRM. Based on this model, we propose MELEs of the Youden index and the optimal cutoff point. We further establish the asymptotic normality of the proposed estimators and construct valid confidence intervals for the Youden index and the corresponding optimal cutoff point. The proposed method automatically covers both cases when there is no lower limit of detection and when there is a fixed and finite lower limit of detection for the biomarker. Extensive simulation studies and a real-data example are used to illustrate the effectiveness of the proposed method and its advantages over the existing methods. The Gini index is a popular inequality measure with many applications in social and economic studies. Chapter 4 studies inference on the Gini indices of two semicontinuous populations. We characterize the distribution of each semicontinuous population by a mixture of a discrete point mass at zero and a continuous skewed positive component. The DRM is then employed to link the positive components of the two distributions. We propose the MELEs of the two Gini indices and their difference, and further investigate the asymptotic properties of the proposed estimators. The asymptotic results enable us to construct confidence intervals and perform hypothesis tests for the two Gini indices and their difference. We show that the proposed estimators are more efficient than the existing fully nonparametric estimators. The proposed estimators and the asymptotic results are also applicable to cases without excessive zero values. Simulation studies show the superiority of our proposed method over existing methods. Two real-data applications are presented using the proposed methods. In Chapter 5, we summarize our research contributions and discuss some interesting topics, which are related to our current work, for future research.
Cite this version of the work
Meng Yuan (2021). Semiparametric Empirical Likelihood Inference under Two-sample Density Ratio Models. UWSpace. http://hdl.handle.net/10012/17301