Explorations in Pairwise Measures of Dependence and Pooled Significance

dc.contributor.authorSalahub, Chris
dc.date.accessioned2024-01-22T16:25:16Z
dc.date.available2024-01-22T16:25:16Z
dc.date.issued2024-01-22
dc.date.submitted2024-01-16
dc.description.abstractIn the exploration of data sets with many variables, the search for interesting pairs is often the first step of analysis. This search builds a road map of the entirety of data before looking at its details, and can provide indispensable inspiration for deeper inves- tigation. Challenges are present, however, in adjusting results to address the multiple testing problem and choosing a measure with sufficient generality to detect many forms of dependence. This work proposes the measurement of statistical dependence by recursive binning of marginal ranks as a flexible measure of dependence. Simulation studies are used to characterize the null distribution and demonstrate the method’s sensitivity to different data patterns. By splitting bins randomly, the χ2 statistic has a null distribution conservatively approximated by the χ2 distribution seemingly without a loss of power compared to maximized splitting rules, which has an inflated statistic value. The method is demonstrated on real S&P 500 constituent data. To adjust for multiple testing, a new framework and coefficient are devised with appropriate proofs for analyzing pooled p-values based on their tendency to detect concentrated or diffuse evidence. This motivates a pooled p-value based on the χ2 quantile function as a way to adjust for multiple testing while controlling the family-wise error rate and fine-tuning for the evidence pattern of interest. Simulation studies suggest this method is similarly powerful to the uniformly most powerful method while being more robust to mis-specification. Both the recursive binning measurement of association and the χ2 pooled p-value are then demonstrated for genetic data after a tutorial introducing the relevant genetic concepts. A method of moments adjustment of the χ2 pooled p-value to account for correlation between tests is introduced and used with genomic and phenomic data from mice to identify regions of interest. The use of pooled p-values to combine parameter estimates in meta-analysis is also explored, establishing the concepts of evidential intervals and demonstrating their behaviour on simulated data.en
dc.identifier.urihttp://hdl.handle.net/10012/20262
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectexploratory data analysisen
dc.subjectassociationen
dc.subjectdependenceen
dc.subjectpooled p-valuesen
dc.subjectmultiple testingen
dc.subjectgenomicsen
dc.subjecttree-based binningen
dc.subjectmeta-analysisen
dc.titleExplorations in Pairwise Measures of Dependence and Pooled Significanceen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentStatistics and Actuarial Scienceen
uws-etd.degree.disciplineStatisticsen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorOlford, Wayne
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Salahub_Chris.pdf
Size:
8.92 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: