Statistical Inference in ROC Curve Analysis

Hu, Dingding

Statistical Inference in ROC Curve Analysis

dc.contributor.author	Hu, Dingding
dc.date.accessioned	2025-07-07T15:20:52Z
dc.date.available	2025-07-07T15:20:52Z
dc.date.issued	2025-07-07
dc.date.submitted	2025-07-02
dc.description.abstract	The receiver operating characteristic (ROC) curve is a powerful statistical tool to evaluate the diagnostic abilities of a binary classifier for varied discrimination thresholds. It has been widely applied in various scientific areas. This thesis considers three inference problems in the ROC curve analysis. In Chapter 1, we introduce the basic concept of the ROC curve, along with some of its summary indices. We then provide an overview of the research problems and outline the structure of the subsequent chapters. Chapter 2 focuses on improving the ROC curve analysis with a single biomarker by incorporating the assumption that higher biomarker values indicate greater disease severity or likelihood. We interpret “greater severity” as a higher probability of disease, which corresponds to the likelihood ratio ordering between diseased and healthy individuals. Under this assumption, we propose a Bernstein polynomial-based method to model and estimate the biomarker distributions using the maximum empirical likelihood framework. From the estimated distributions, we derive the ROC curve and its summary indices. We establish the asymptotic consistency of our estimators and validate their performance through extensive simulations and compare them with existing methods. A real-data example is used to demonstrate the practical applicability of our approach. Chapter 3 considers the ROC curve analysis for medical data with non-ignorable missingness in the disease status. In the framework of the logistic regression models for both the disease status and the verification status, we first establish the identifiability of model parameters, and then propose a likelihood method to estimate the model parameters, the ROC curve, and the area under the ROC curve (AUC) for the biomarker. The asymptotic distributions of these estimators are established. Via extensive simulation studies, we compare our method with competing methods in the point estimation and assess the accuracy of confidence interval estimation under various scenarios. To illustrate the application of the proposed method in practical data, we apply our method to the Alzheimer's disease dataset from the National Alzheimer's Coordinating Center. Chapter 4 explores the combination of multiple biomarkers when disease status is determined by an imperfect reference standard, which can lead to misclassification. Previous methods for combining multiple biomarkers typically assume that all disease statuses are determined by a gold standard test, limiting their ability to accurately estimate the ROC curve and AUC in the presence of misclassification. We propose modeling the distributions of biomarkers from truly healthy and diseased individuals using a semiparametric density ratio model. Additionally, we adopt two assumptions from the literature: (1) the biomarkers are conditionally independent of the classification of the imperfect reference standard given the true disease status, and (2) the classification accuracy of the imperfect reference standard is known. Using this framework, we establish the identifiability of model parameters and propose a maximum empirical likelihood method to estimate the ROC curve and AUC for the optimal combination of biomarkers. An Expectation-Maximization algorithm is developed for numerical calculation. Additionally, we propose a bootstrap method to construct the confidence interval for the AUC and the confidence band for the ROC curve. Extensive simulations are conducted to evaluate the robustness of our method with respect to label misclassification. Finally, we demonstrate the effectiveness of our method in a real-data application. In Chapter 5, we provide a brief summary of Chapters 2-4 and outline several directions for future research.
dc.identifier.uri	https://hdl.handle.net/10012/21975
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	ROC curve
dc.subject	empirical likelihood
dc.subject	non-ignorable missing
dc.subject	EM algorithm
dc.subject	AUC
dc.subject	Youden index
dc.subject	imperfect reference
dc.title	Statistical Inference in ROC Curve Analysis
dc.type	Doctoral Thesis
uws-etd.degree	Doctor of Philosophy
uws-etd.degree.department	Statistics and Actuarial Science
uws-etd.degree.discipline	Statistics
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Li, Pengfei
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Hu_Dingding.pdf
Size:: 1.29 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Statistics and Actuarial Science