Private Distribution Learning with Public Data
Loading...
Date
2024-01-22
Authors
Bie, Alex
Advisor
Kamath, Gautam
Ben-David, Shai
Ben-David, Shai
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
We study the problem of private distribution learning with access to public data. In this setup, a learner is given both public and private samples drawn from an unknown distribution 𝑝 belonging to a class 𝑄, and has the task of outputting an estimate of 𝑝 while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. Our setting is motivated by the privacy-utility tradeoff: algorithms satisfying the mathematical definition of differential privacy offer provable privacy guarantees for the data they operate on, however, owing to such a constraint, exhibit degraded accuracy. In particular, there are classes 𝑄 where learning is possible when privacy is not a concern, but for which any algorithm satisfying the constraint of pure differential privacy will fail on. We show that in several scenarios, we can use a small amount of public data to evade such impossibility results. Additionally, we complement these positive results with an analysis of how much public data is necessary to see such improvements. Our main result is that to learn the class of all Gaussians in ℝᵈ under pure differential privacy, 𝑑+1 public samples suffice while 𝑑 public samples are necessary.
Description
Keywords
differential privacy, machine learning, density estimation, theory of machine learning