Explorations in Pairwise Measures of Dependence and Pooled Significance

Salahub, Chris

dc.contributor.author	Salahub, Chris
dc.date.accessioned	2024-01-22 16:25:16 (GMT)
dc.date.available	2024-01-22 16:25:16 (GMT)
dc.date.issued	2024-01-22
dc.date.submitted	2024-01-16
dc.identifier.uri	http://hdl.handle.net/10012/20262
dc.description.abstract	In the exploration of data sets with many variables, the search for interesting pairs is often the first step of analysis. This search builds a road map of the entirety of data before looking at its details, and can provide indispensable inspiration for deeper inves- tigation. Challenges are present, however, in adjusting results to address the multiple testing problem and choosing a measure with sufficient generality to detect many forms of dependence. This work proposes the measurement of statistical dependence by recursive binning of marginal ranks as a flexible measure of dependence. Simulation studies are used to characterize the null distribution and demonstrate the method’s sensitivity to different data patterns. By splitting bins randomly, the χ2 statistic has a null distribution conservatively approximated by the χ2 distribution seemingly without a loss of power compared to maximized splitting rules, which has an inflated statistic value. The method is demonstrated on real S&P 500 constituent data. To adjust for multiple testing, a new framework and coefficient are devised with appropriate proofs for analyzing pooled p-values based on their tendency to detect concentrated or diffuse evidence. This motivates a pooled p-value based on the χ2 quantile function as a way to adjust for multiple testing while controlling the family-wise error rate and fine-tuning for the evidence pattern of interest. Simulation studies suggest this method is similarly powerful to the uniformly most powerful method while being more robust to mis-specification. Both the recursive binning measurement of association and the χ2 pooled p-value are then demonstrated for genetic data after a tutorial introducing the relevant genetic concepts. A method of moments adjustment of the χ2 pooled p-value to account for correlation between tests is introduced and used with genomic and phenomic data from mice to identify regions of interest. The use of pooled p-values to combine parameter estimates in meta-analysis is also explored, establishing the concepts of evidential intervals and demonstrating their behaviour on simulated data.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	exploratory data analysis	en
dc.subject	association	en
dc.subject	dependence	en
dc.subject	pooled p-values	en
dc.subject	multiple testing	en
dc.subject	genomics	en
dc.subject	tree-based binning	en
dc.subject	meta-analysis	en
dc.title	Explorations in Pairwise Measures of Dependence and Pooled Significance	en
dc.type	Doctoral Thesis	en
dc.pending	false
uws-etd.degree.department	Statistics and Actuarial Science	en
uws-etd.degree.discipline	Statistics	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Olford, Wayne
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Salahub_Chris.pdf
Size:: 8.923Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record