CORK: A privacy-preserving and lossless federated learning scheme for deep neural network
Introduction
In recent years, the deep neural network (DNN) model has been widely applied and gained huge success in many fields (e.g., natural language process [1], computer vision [2], human–machine game [3], and so on.), bringing great convenience to people’s life. Meanwhile, due to explosive growth of data volume generated by distributed devices, coupled with privacy concerns of data collection, the concept of federated learning has been introduced by Google [4], which essentially involves training a high-quality global model over multiple participants while remaining data localized, as shown in Fig. 1. In particular, during each training round of federated learning, participants first perform a stochastic gradient descent (SGD) algorithm locally on their training data to generate the local updates. After that, the local updates are further aggregated by aggregation server to update the global model parameters.
Nevertheless, there are still many privacy issues in federated learning. Many previous research works have demonstrated that the inference attacks could reveal sensitive information of participants from both local updates and global model parameters [5], [6]. On the one hand, local updates are derived from training data of participants, thus they contain massive sensitive information. Only with analyzing model updates in just a few rounds, while without any auxiliary knowledge, the aggregation server could infer a certain participant’s data information (i.e., membership, class representation, or property), or even reconstruct its raw local training data [7], [8], [9]. On the other hand, since the DNN model appears to internally mine useful information in training data, the global model parameters also include sensitive information. For example, by calculating the difference between consecutive global model parameters, an honest-but-curious participant can get the aggregated local updates of other participants, then infer the membership and properties of their data in a certain training round [10]. Therefore, there is an urgent need to design privacy-preserving federated learning schemes for the DNN model.
In order to solve the above privacy issues, plenty of privacy-preserving schemes have been proposed, which can mainly be classified into two categories: based on secure aggregation (SA) or differential privacy (DP). Specifically, with an SA algorithm, an aggregation server can only obtain the summation of multiple participants’ local updates, while it is not able to see any concrete local update of a solo participant [11], [12], [13], [14], [15], [16]. As a result, SA-based schemes can well protect the sensitive data in local updates from the aggregation server, but it’s hard to prevent inference attacks on the global model parameters. In addition to the SA algorithm, DP is a common method to protect data privacy in federated learning. In the popular differential privacy schemes, the local updates are perturbed randomly by participants at each round [17], [18], [19], [20]. DP-based schemes can provide strong privacy guarantees, but it inevitably reduces the accuracy of the trained model.
In this paper, we propose a privacy-preserving and lossless federated learning scheme for DNN, named CORK. With CORK, multiple participants can collaborate to train a DNN model securely and accurately with the assistance of an aggregation server. By combining our proposed drop-tolerant secure aggregation algorithm FTSA and lossless model perturbation mechanism PTSP, sensitive data in both local updates and global model parameters are protected well during training. Furthermore, the neuron pruning operation in PTSP can reduce the scale of models, which improves the computation and communication efficiency significantly. Specifically, our contributions are the following:
- •
CORK protects sensitive data in both local updates and global model parameters. During training, the local updates are encrypted by FTSA, which can keep confidential to the aggregation server. Besides, PTSP changes the orders and values of global model parameters greatly, which makes an honest-but-curious participant impossible to infer others’ sensitive data by comparing the consecutive global model parameters. Therefore, sensitive data are protected well from the inference attacks of aggregation server and other participants in CORK.
- •
CORK achieves lossless and drop-tolerant federated learning for DNN. In federated learning, a participant may drop out midway due to connectivity or power constraints. In this regard, through applying the Shamir secret sharing in FTSA, even if some participants drop out at a training round, it could still aggregate the local updates of remaining participants. Moreover, in CORK, PTSP prunes and merges redundant neurons in the DNN model, which doesn’t cause a loss of model accuracy.
- •
CORK is efficient in both computation and communication overhead. At each training round, the computation and communication overhead is significantly reduced with the neuron pruning operation in PTSP. In addition, the evaluation on real MNIST and CIFAR-10 datasets shows that our scheme is effective and efficient, which can be implemented in a real environment.
Section snippets
Models, Security Requirements and Design Goal
In this section, we first outline the system model, threat model, and security requirements in our scenario. After that, we identify our design goal.
Preliminaries
In this section, we review some preliminaries related to our scheme.
Proposed Scheme
In this section, we present a detailed description of our CORK scheme, which mainly consists of four phases: (1) System initialization; (2) Model perturbation and distribution; (3) Local training and encryption; (4) Secure aggregation and model recovery. The overview of CORK is shown in Fig. 4. At first, TA generates and distributes public parameters and keys. Then, n participants train the model iteratively with the assistance of AS. At each training round, AS perturbs and distributes the
Security Analysis
In this section, we analyze the security of CORK. Specifically, corresponding to the threat model and security requirements discussed in Section 2, we mainly focus on protecting the sensitive data in global model parameters and local updates from inference attacks.
Performance Evaluation
In this section, we evaluate and analyze the performance of our CORK in terms of accuracy, computation cost, and communication overhead, and make a comparison with a multiparty deep learning scheme called MDL [12]. Based on asynchronous optimization, homomorphic encryption, and threshold secret sharing, MDL can ensure that participants learn the global model only if a sufficient number of local updates are aggregated. Specifically, the ElGamal encryption is used in MDL to achieve the
Related Work
In this section, we introduce some related works about privacy-preserving federated learning for DNN. And we classify the existing schemes into two categories: based on SA [29], [12], [13], [30], [31], [32], [33], [16] or DP [34], [18], [35], [36], [19], [20].
Conclusion
In this paper, we proposed CORK, a privacy-preserving and lossless federated learning scheme for DNN. Based on our proposed drop-tolerant secure aggregation algorithm FTSA, on the one hand, CORK can ensure that local updates of participants are confidential to AS, on the other hand, could keep the normal training process even though some participants drop out. Besides, by applying our lossless model perturbation mechanism PTSP, under the premise of lossless model accuracy, a participant cannot
CRediT authorship contribution statement
Jiaqi Zhao: Conceptualization, Writing - original draft, Software. Hui Zhu: Project administration, Funding acquisition. Fengwei Wang: Writing - review & editing. Rongxing Lu: Methodology. Hui Li: Supervision. Jingwei Tu: Funding acquisition. Jie Shen: Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported by National Natural Science Foundation of China (61972304, 61932015), Science Foundation of the Ministry of Education (MCM20200101), and Shaanxi Provincial Key Research and Development Program (2020ZDLGY08-04).
References (36)
- et al.
Face recognition based on extreme learning machine
Neurocomputing
(2011) - et al.
A privacy-preserving and non-interactive federated learning scheme for regression training with gradient descent
Inf. Sci.
(2021) - et al.
Privacy-preserving and verifiable online crowdsourcing with worker updates
Inf. Sci.
(2021) Deep learning in neural networks: An overview
Neural Networks
(2015)- et al.
Recent trends in deep learning based natural language processing
IEEE Comput. Intell. Mag.
(2018) - et al.
Mastering the game of go without human knowledge
Nat.
(2017) - J. Konecný, H.B. McMahan, F.X. Yu, P. Richtárik, A.T. Suresh, D. Bacon, Federated learning: Strategies for improving...
- et al.
Sok: Security and privacy in machine learning
EuroS&P, IEEE
(2018) - L. Lyu, H. Yu, Q. Yang, Threats to federated learning: A survey, CoRR abs/2003.02133....
- et al.
Privacy-preserving deep learning via additively homomorphic encryption
IEEE Trans. Inf. Forensics Secur.
(2018)
Exploiting unintended feature leakage in collaborative learning
Cited by (18)
VPPLR: Privacy-preserving logistic regression on vertically partitioned data using vectorization sharing
2024, Journal of Information Security and ApplicationsUniversal adversarial backdoor attacks to fool vertical federated learning
2024, Computers and SecurityA novel privacy-preserving graph convolutional network via secure matrix multiplication
2024, Information SciencesSecure and efficient multi-key aggregation for federated learning
2024, Information SciencesCommunication and computation efficiency in Federated Learning: A survey
2023, Internet of Things (Netherlands)