Private Distribution Learning with Public Data

Bie, Alex

Private Distribution Learning with Public Data

Files

Bie_Alex.pdf (692.94 KB)

Date

2024-01-22

Authors

Bie, Alex

Advisor

Kamath, Gautam
Ben-David, Shai

Publisher

University of Waterloo

Abstract

We study the problem of private distribution learning with access to public data. In this setup, a learner is given both public and private samples drawn from an unknown distribution 𝑝 belonging to a class 𝑄, and has the task of outputting an estimate of 𝑝 while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. Our setting is motivated by the privacy-utility tradeoff: algorithms satisfying the mathematical definition of differential privacy offer provable privacy guarantees for the data they operate on, however, owing to such a constraint, exhibit degraded accuracy. In particular, there are classes 𝑄 where learning is possible when privacy is not a concern, but for which any algorithm satisfying the constraint of pure differential privacy will fail on. We show that in several scenarios, we can use a small amount of public data to evade such impossibility results. Additionally, we complement these positive results with an analysis of how much public data is necessary to see such improvements. Our main result is that to learn the class of all Gaussians in ℝᵈ under pure differential privacy, 𝑑+1 public samples suffice while 𝑑 public samples are necessary.