Learn Privacy-friendly Global Gaussian Processes in Federated Learning

Yu, Haolin

Learn Privacy-friendly Global Gaussian Processes in Federated Learning

Files

Yu_Haolin.pdf (621.5 KB)

Date

2022-08-17

Authors

Yu, Haolin

Advisor

Poupart, Pascal

Publisher

University of Waterloo

Abstract

In the era of big data, Federated Learning (FL) has drawn great attention as it naturally operates on distributed computational resources without the need of data warehousing. Similar to Distributed Learning (DL), FL distributes most computational tasks to end devices, but emphasizes more on preserving the privacy of clients. In other words, any FL algorithm should not send raw client data, if not the information about them, that could leak privacy. As a result, in typical scenarios where the FL framework applies, it is common for clients to have or obtain insufficient training data to produce an accurate model. To decide whether a prediction is trustworthy, models that provide not only point estimations, but also some notion of confidence are beneficial. Gaussian Process (GP) is a powerful Bayesian model that comes with naturally well-calibrated variance estimations. However, it is challenging to learn a stand-alone global GP since merging local kernels leads to privacy leakage. To preserve privacy, previous works that consider federated GPs avoid learning a global model by focusing on the personalized setting or learning an ensemble of local models. In this work, we present Federated Bayesian Neural Regression (FedBNR), an algorithm that learns a scalable stand-alone global federated GP that respects clients' privacy. We incorporate deep kernel learning and random features for scalability by defining a unifying random kernel. We show this random kernel can recover any stationary kernel and many non-stationary kernels. We then derive a principled approach of learning a global predictive model as if all client data is centralized. We also learn global kernels with knowledge distillation methods for non-identically and independently distributed (non-i.i.d.) clients. We design synthetic experiments to illustrate scenarios where our model has a clear advantage and provide insights into the rationales. Experiments are also conducted on real-world regression datasets and show statistically significant improvements compared to other federated GP models.