Ancestry Deconvolution via Differential Privacy

dc.contributor.authorChowdhury, Raiyan
dc.date.accessioned2026-02-05T13:45:51Z
dc.date.available2026-02-05T13:45:51Z
dc.date.issued2026-02-05
dc.date.submitted2026-01-23
dc.description.abstractThis thesis presents the first study of ancestry determination under differential privacy (DP). Direct-to-consumer genomics companies, such as 23andMe, offer ancestry testing to millions of individuals, yet remain vulnerable to severe data breaches. Such incidents are especially concerning because genomic data is uniquely identifying, highly correlated, and permanent once exposed. At the time of writing, 23andMe disclosed a catastrophic breach in October 2023 that compromised the genetic profiles of an estimated 6.9 million users, underscoring the urgent need for stronger privacy guarantees in genomic analysis. In this work, we investigate the application of DP to ancestry deconvolution. Using the 1000 Genomes dataset and Gnomix, a state-of-the-art ancestry inference model, we evaluate how privatizing single nucleotide polymorphism (SNP) data affects ancestry classification accuracy. We implement both naïve and correlation-aware local differential privacy (LDP) mechanisms across varying privacy budgets, enabling a systematic study of the privacy-utility trade-off in ancestry inference. Our results demonstrate that while naïve DP perturbations significantly degrade accuracy, correlation-aware LDP mechanisms preserve substantially more predictive power by accounting for linkage disequilibrium (LD). This thesis establishes a foundation for private ancestry deconvolution, providing an empirical benchmark of state-of-the-art DP methods in genomics and highlighting both the challenges and potential of integrating DP into ancestry testing.
dc.identifier.urihttps://hdl.handle.net/10012/22926
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://aws.amazon.com/marketplace/pp/prodview-bcuxxyvo4twb2#resources
dc.subjectdifferential privacy
dc.subjectancestry
dc.subjectgenomics
dc.titleAncestry Deconvolution via Differential Privacy
dc.typeMaster Thesis
uws-etd.degreeMaster of Mathematics
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorKerschbaum, Florian
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chowdhury_Raiyan.pdf
Size:
433.68 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections