Differentially Private Learning with Noisy Labels
MetadataShow full item record
Supervised machine learning tasks require large labelled datasets. However, obtaining such datasets is a difficult task and often leads to noisy labels due to human errors or adversarial perturbation. Recent studies have shown multiple methods to tackle this problem in the non-private scenario, yet this remains an unsolved problem when the dataset is private. In this work, we aim to train a model on a sensitive dataset that contains noisy labels such that (i) the model has high test accuracy and (ii) the training process satisfies (ε,δ)-differential privacy. Noisy labels, as studied in our work, are generated by flipping labels in the training set, from the true source label(s) to other targets (s). Our approach, Diffindo, constructs a differentially private stochastic gradient descent algorithm which removes suspicious points based on their noisy gradients. We show experiments on datasets across multiple domains with different class balance properties. Our results show that the proposed algorithm can remove up to 100% of the points with noisy labels in the private scenario while restoring the precision of the targeted label and testing accuracy to its no-noise counterparts.
Cite this version of the work
Shubhankar Mohapatra (2020). Differentially Private Learning with Noisy Labels. UWSpace. http://hdl.handle.net/10012/15939