|dc.description.abstract||The prediction power of deep learning models depends on the size and quality of the training data. Having access to large-scale datasets enables the model to more precisely estimate the underlying distribution of the data. Deep models rely on training datasets that are ideally aggregated from various sources. However, it may not be possible to construct large-scale datasets in one central location in the medical domain due to privacy considerations. In centralized learning methods for medical imaging, training data is supposed to originate from different medical centers (i.e., hospitals and clinics), and be transferred to a centralized location, commonly called a server. However, hospitals are generally not willing to share their medical records with other external collaborators because of privacy considerations and regulatory compliance. Therefore, the lack of publicly available large-scale diverse datasets hinders model development in healthcare. To overcome these challenges, decentralized learning methods are a promising scheme to preserve data privacy while enabling training of general models using data from different sources. Federated learning allows training on multi-site datasets without requiring direct access to data. Federated learning has emerged as a promising solution to protect user-sensitive data by keeping data local. It is a novel decentralized paradigm that plays a critical role in privacy-sensitive applications, opening new horizons for secured decentralized learning methods.
The main focus of this research is privacy-preserving federated learning. The two key challenges in federated learning, namely privacy of the training results and fairness in aggregating training results will be addressed.
The first challenge is that training results are as important as training samples as they may reveal privacy clues. To address this challenge, this thesis adopts secure multi-party computation and proposes a framework enabling participant hospitals to maintain privacy while sharing their training results. The second challenge is that the collaboratively learned global model is supposed to have acceptable performance for the individual sites. However, existing methods focus on model averaging, leading to a biased model that performs perfectly for some hospitals while exhibiting undesirable performance for other sites due to non-iid data distribution among hospitals. This challenge will be addressed by improving the model fairness among participating hospitals through introduction of a novel federated learning scheme called Proportionally Fair Federated Learning, Prop-FFL. Proportional fairness modifies the aggregation rule at the central server to account for varying site contributions. It is based on a novel optimization objective function to decrease the performance variation among hospitals. Experiments have been conducted on The Cancer Genome Atlas (TCGA), a publicly available repository. The experimental results suggest competitive performance compared to the baseline and benchmark schemes.||en