Explorations in Pairwise Measures of Dependence and Pooled Significance

Salahub, Chris

Explorations in Pairwise Measures of Dependence and Pooled Significance

Files

Salahub_Chris.pdf (8.92 MB)

Date

2024-01-22

Authors

Salahub, Chris

Advisor

Olford, Wayne

Publisher

University of Waterloo

Abstract

In the exploration of data sets with many variables, the search for interesting pairs is often the first step of analysis. This search builds a road map of the entirety of data before looking at its details, and can provide indispensable inspiration for deeper inves- tigation. Challenges are present, however, in adjusting results to address the multiple testing problem and choosing a measure with sufficient generality to detect many forms of dependence. This work proposes the measurement of statistical dependence by recursive binning of marginal ranks as a flexible measure of dependence. Simulation studies are used to characterize the null distribution and demonstrate the method’s sensitivity to different data patterns. By splitting bins randomly, the χ2 statistic has a null distribution conservatively approximated by the χ2 distribution seemingly without a loss of power compared to maximized splitting rules, which has an inflated statistic value. The method is demonstrated on real S&P 500 constituent data. To adjust for multiple testing, a new framework and coefficient are devised with appropriate proofs for analyzing pooled p-values based on their tendency to detect concentrated or diffuse evidence. This motivates a pooled p-value based on the χ2 quantile function as a way to adjust for multiple testing while controlling the family-wise error rate and fine-tuning for the evidence pattern of interest. Simulation studies suggest this method is similarly powerful to the uniformly most powerful method while being more robust to mis-specification. Both the recursive binning measurement of association and the χ2 pooled p-value are then demonstrated for genetic data after a tutorial introducing the relevant genetic concepts. A method of moments adjustment of the χ2 pooled p-value to account for correlation between tests is introduced and used with genomic and phenomic data from mice to identify regions of interest. The use of pooled p-values to combine parameter estimates in meta-analysis is also explored, establishing the concepts of evidential intervals and demonstrating their behaviour on simulated data.

Keywords

exploratory data analysis, association, dependence, pooled p-values, multiple testing, genomics, tree-based binning, meta-analysis

URI

http://hdl.handle.net/10012/20262

Collections

Theses
Statistics and Actuarial Science

Full item page

Explorations in Pairwise Measures of Dependence and Pooled Significance

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections