High-Dimensional Statistical Inference and False Discovery Rate Control with Covariates

Zheng, Liyuan

High-Dimensional Statistical Inference and False Discovery Rate Control with Covariates

dc.contributor.advisor	Qin, Yingli
dc.contributor.advisor	Liang, Kun
dc.contributor.author	Zheng, Liyuan
dc.date.accessioned	2025-01-17T16:18:13Z
dc.date.available	2025-01-17T16:18:13Z
dc.date.issued	2025-01-17
dc.date.submitted	2025-01-15
dc.description.abstract	In this thesis, we focus on three statistical problems. First, we consider graph-based tests for differences of two high-dimensional distributions. Second, we investigate the estimation of multiple large covariance matrices and the application to high-dimensional quadratic discriminant analysis. Lastly, we focus on controlling the false discovery rate while incorporating complex auxiliary information. Testing whether two samples are from a common distribution is an important problem in statistics. Friedman & Rafsky (1979) proposed a non-parametric multivariate distribution test based on the minimal spanning tree (MST). Recently, this test has been extended under various scenarios. However, as demonstrated in Chapter 2, these extensions are not sensitive to sparse alternatives. To address this, we propose a two-step testing procedure, IM-MST. Specifically, IM-MST incorporates marginal screening while accounting for the dependence structure via energy distance, followed by MST-based tests. IM-MST combines the strength of both non-parametric screening and MST-based tests. Simulation studies and real data applications are conducted to evaluate the numerical performance of the two-step procedure, demonstrating that IM-MST exhibits substantial power gains. When estimating covariance matrices for data from two related categories, it is reasonable to assume that these covariance matrices share certain structural components. As a result, the precision matrix (the inverse of the covariance matrix) for each category can be decomposed into three parts: a common diagonal component, a common low-rank component, and a category-specific low-rank component. This decomposition can be motivated by a factor model, where some latent factors are common across two categories while others are specific to individual categories. In Chapter 3, we propose a consistent joint estimation method for two precision matrices building on the work of Wu (2017). Furthermore, these estimators are applied to formulate a high-dimensional quadratic discriminant analysis (QDA) rule, for which we derive the convergence rate for the classification error. In many genetic multiple testing applications, the signs of the test statistics provide important directional information. For example, in RNA-seq data analysis, a negative sign could suggest that the expression of the corresponding gene is potentially suppressed, while a positive sign could indicate a potentially elevated expression level. However, most existing procedures that control the false discovery rate (FDR) ignore such valuable information. In Chapter 4, we extend the covariate and direction adaptive knockoff procedure (Tian 2020) by implementing powerful predictive functions. Through simulation studies and real data analysis, we show that our procedures are competitive to existing covariate-adaptive methods. The companion R package Codak is available.
dc.identifier.uri	https://hdl.handle.net/10012/21377
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.title	High-Dimensional Statistical Inference and False Discovery Rate Control with Covariates
dc.type	Doctoral Thesis
uws-etd.degree	Doctor of Philosophy
uws-etd.degree.department	Statistics and Actuarial Science
uws-etd.degree.discipline	Statistics
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	1 year
uws.contributor.advisor	Qin, Yingli
uws.contributor.advisor	Liang, Kun
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zheng_Liyuan.pdf
Size:: 1.21 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Statistics and Actuarial Science