Explorations in Pairwise Measures of Dependence and Pooled Significance
dc.contributor.author | Salahub, Chris | |
dc.date.accessioned | 2024-01-22T16:25:16Z | |
dc.date.available | 2024-01-22T16:25:16Z | |
dc.date.issued | 2024-01-22 | |
dc.date.submitted | 2024-01-16 | |
dc.description.abstract | In the exploration of data sets with many variables, the search for interesting pairs is often the first step of analysis. This search builds a road map of the entirety of data before looking at its details, and can provide indispensable inspiration for deeper inves- tigation. Challenges are present, however, in adjusting results to address the multiple testing problem and choosing a measure with sufficient generality to detect many forms of dependence. This work proposes the measurement of statistical dependence by recursive binning of marginal ranks as a flexible measure of dependence. Simulation studies are used to characterize the null distribution and demonstrate the method’s sensitivity to different data patterns. By splitting bins randomly, the χ2 statistic has a null distribution conservatively approximated by the χ2 distribution seemingly without a loss of power compared to maximized splitting rules, which has an inflated statistic value. The method is demonstrated on real S&P 500 constituent data. To adjust for multiple testing, a new framework and coefficient are devised with appropriate proofs for analyzing pooled p-values based on their tendency to detect concentrated or diffuse evidence. This motivates a pooled p-value based on the χ2 quantile function as a way to adjust for multiple testing while controlling the family-wise error rate and fine-tuning for the evidence pattern of interest. Simulation studies suggest this method is similarly powerful to the uniformly most powerful method while being more robust to mis-specification. Both the recursive binning measurement of association and the χ2 pooled p-value are then demonstrated for genetic data after a tutorial introducing the relevant genetic concepts. A method of moments adjustment of the χ2 pooled p-value to account for correlation between tests is introduced and used with genomic and phenomic data from mice to identify regions of interest. The use of pooled p-values to combine parameter estimates in meta-analysis is also explored, establishing the concepts of evidential intervals and demonstrating their behaviour on simulated data. | en |
dc.identifier.uri | http://hdl.handle.net/10012/20262 | |
dc.language.iso | en | en |
dc.pending | false | |
dc.publisher | University of Waterloo | en |
dc.subject | exploratory data analysis | en |
dc.subject | association | en |
dc.subject | dependence | en |
dc.subject | pooled p-values | en |
dc.subject | multiple testing | en |
dc.subject | genomics | en |
dc.subject | tree-based binning | en |
dc.subject | meta-analysis | en |
dc.title | Explorations in Pairwise Measures of Dependence and Pooled Significance | en |
dc.type | Doctoral Thesis | en |
uws-etd.degree | Doctor of Philosophy | en |
uws-etd.degree.department | Statistics and Actuarial Science | en |
uws-etd.degree.discipline | Statistics | en |
uws-etd.degree.grantor | University of Waterloo | en |
uws-etd.embargo.terms | 0 | en |
uws.contributor.advisor | Olford, Wayne | |
uws.contributor.affiliation1 | Faculty of Mathematics | en |
uws.peerReviewStatus | Unreviewed | en |
uws.published.city | Waterloo | en |
uws.published.country | Canada | en |
uws.published.province | Ontario | en |
uws.scholarLevel | Graduate | en |
uws.typeOfResource | Text | en |