Elsevier

Information Sciences

Volume 603, July 2022, Pages 190-209
Information Sciences

CORK: A privacy-preserving and lossless federated learning scheme for deep neural network

https://doi.org/10.1016/j.ins.2022.04.052Get rights and content

Abstract

With the advance of machine learning technology and especially the explosive growth of big data, federated learning, which allows multiple participants to jointly train a high-quality global machine learning model, has gained extensive attention. However, in federated learning, it has been proved that inference attacks could reveal sensitive information from both local updates and global model parameters, which threatens user privacy greatly. Aiming at the challenge, in this paper, a privacy-preserving and lossless federated learning scheme, named CORK, is proposed for deep neural network. With CORK, multiple participants can train a global model securely and accurately with the assistance of an aggregation server. Specifically, we first design a drop-tolerant secure aggregation algorithm FTSA, which ensures the confidentiality of local updates. Then, a lossless model perturbation mechanism PTSP is proposed to protect sensitive data in global model parameters. Furthermore, the neuron pruning operation in PTSP can reduce the scale of models, which thus improves the computation and communication efficiency significantly. Detailed security analysis shows that CORK can resist inference attacks on both local updates and global model parameters. In addition, CORK is implemented with real MNIST and CIFAR-10 datasets, and the experimental results demonstrate that CORK is indeed effective and efficient.

Introduction

In recent years, the deep neural network (DNN) model has been widely applied and gained huge success in many fields (e.g., natural language process [1], computer vision [2], human–machine game [3], and so on.), bringing great convenience to people’s life. Meanwhile, due to explosive growth of data volume generated by distributed devices, coupled with privacy concerns of data collection, the concept of federated learning has been introduced by Google [4], which essentially involves training a high-quality global model over multiple participants while remaining data localized, as shown in Fig. 1. In particular, during each training round of federated learning, participants first perform a stochastic gradient descent (SGD) algorithm locally on their training data to generate the local updates. After that, the local updates are further aggregated by aggregation server to update the global model parameters.

Nevertheless, there are still many privacy issues in federated learning. Many previous research works have demonstrated that the inference attacks could reveal sensitive information of participants from both local updates and global model parameters [5], [6]. On the one hand, local updates are derived from training data of participants, thus they contain massive sensitive information. Only with analyzing model updates in just a few rounds, while without any auxiliary knowledge, the aggregation server could infer a certain participant’s data information (i.e., membership, class representation, or property), or even reconstruct its raw local training data [7], [8], [9]. On the other hand, since the DNN model appears to internally mine useful information in training data, the global model parameters also include sensitive information. For example, by calculating the difference between consecutive global model parameters, an honest-but-curious participant can get the aggregated local updates of other participants, then infer the membership and properties of their data in a certain training round [10]. Therefore, there is an urgent need to design privacy-preserving federated learning schemes for the DNN model.

In order to solve the above privacy issues, plenty of privacy-preserving schemes have been proposed, which can mainly be classified into two categories: based on secure aggregation (SA) or differential privacy (DP). Specifically, with an SA algorithm, an aggregation server can only obtain the summation of multiple participants’ local updates, while it is not able to see any concrete local update of a solo participant [11], [12], [13], [14], [15], [16]. As a result, SA-based schemes can well protect the sensitive data in local updates from the aggregation server, but it’s hard to prevent inference attacks on the global model parameters. In addition to the SA algorithm, DP is a common method to protect data privacy in federated learning. In the popular differential privacy schemes, the local updates are perturbed randomly by participants at each round [17], [18], [19], [20]. DP-based schemes can provide strong privacy guarantees, but it inevitably reduces the accuracy of the trained model.

In this paper, we propose a privacy-preserving and lossless federated learning scheme for DNN, named CORK. With CORK, multiple participants can collaborate to train a DNN model securely and accurately with the assistance of an aggregation server. By combining our proposed drop-tolerant secure aggregation algorithm FTSA and lossless model perturbation mechanism PTSP, sensitive data in both local updates and global model parameters are protected well during training. Furthermore, the neuron pruning operation in PTSP can reduce the scale of models, which improves the computation and communication efficiency significantly. Specifically, our contributions are the following:

  • CORK protects sensitive data in both local updates and global model parameters. During training, the local updates are encrypted by FTSA, which can keep confidential to the aggregation server. Besides, PTSP changes the orders and values of global model parameters greatly, which makes an honest-but-curious participant impossible to infer others’ sensitive data by comparing the consecutive global model parameters. Therefore, sensitive data are protected well from the inference attacks of aggregation server and other participants in CORK.

  • CORK achieves lossless and drop-tolerant federated learning for DNN. In federated learning, a participant may drop out midway due to connectivity or power constraints. In this regard, through applying the Shamir secret sharing in FTSA, even if some participants drop out at a training round, it could still aggregate the local updates of remaining participants. Moreover, in CORK, PTSP prunes and merges redundant neurons in the DNN model, which doesn’t cause a loss of model accuracy.

  • CORK is efficient in both computation and communication overhead. At each training round, the computation and communication overhead is significantly reduced with the neuron pruning operation in PTSP. In addition, the evaluation on real MNIST and CIFAR-10 datasets shows that our scheme is effective and efficient, which can be implemented in a real environment.

The rest of this paper is organized as follows. In Section 2, we define the models and propose our design goal. In Section 3, we outline some building blocks in CORK. In Section 4, we introduce our proposed CORK detailedly, followed by the security analysis and performance evaluation in Section 5 and Section 6. Finally, we review the related works in Section 7 and draw a conclusion in Section 8.

Section snippets

Models, Security Requirements and Design Goal

In this section, we first outline the system model, threat model, and security requirements in our scenario. After that, we identify our design goal.

Preliminaries

In this section, we review some preliminaries related to our scheme.

Proposed Scheme

In this section, we present a detailed description of our CORK scheme, which mainly consists of four phases: (1) System initialization; (2) Model perturbation and distribution; (3) Local training and encryption; (4) Secure aggregation and model recovery. The overview of CORK is shown in Fig. 4. At first, TA generates and distributes public parameters and keys. Then, n participants train the model iteratively with the assistance of AS. At each training round, AS perturbs and distributes the

Security Analysis

In this section, we analyze the security of CORK. Specifically, corresponding to the threat model and security requirements discussed in Section 2, we mainly focus on protecting the sensitive data in global model parameters and local updates from inference attacks.

Performance Evaluation

In this section, we evaluate and analyze the performance of our CORK in terms of accuracy, computation cost, and communication overhead, and make a comparison with a multiparty deep learning scheme called MDL [12]. Based on asynchronous optimization, homomorphic encryption, and threshold secret sharing, MDL can ensure that participants learn the global model only if a sufficient number of local updates are aggregated. Specifically, the ElGamal encryption is used in MDL to achieve the

Related Work

In this section, we introduce some related works about privacy-preserving federated learning for DNN. And we classify the existing schemes into two categories: based on SA [29], [12], [13], [30], [31], [32], [33], [16] or DP [34], [18], [35], [36], [19], [20].

Conclusion

In this paper, we proposed CORK, a privacy-preserving and lossless federated learning scheme for DNN. Based on our proposed drop-tolerant secure aggregation algorithm FTSA, on the one hand, CORK can ensure that local updates of participants are confidential to AS, on the other hand, could keep the normal training process even though some participants drop out. Besides, by applying our lossless model perturbation mechanism PTSP, under the premise of lossless model accuracy, a participant cannot

CRediT authorship contribution statement

Jiaqi Zhao: Conceptualization, Writing - original draft, Software. Hui Zhu: Project administration, Funding acquisition. Fengwei Wang: Writing - review & editing. Rongxing Lu: Methodology. Hui Li: Supervision. Jingwei Tu: Funding acquisition. Jie Shen: Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by National Natural Science Foundation of China (61972304, 61932015), Science Foundation of the Ministry of Education (MCM20200101), and Shaanxi Provincial Key Research and Development Program (2020ZDLGY08-04).

References (36)

  • L. Zhu, S. Han, Deep leakage from gradients, in: NeurIPS, Morgan Kaufmann, 2019, pp. 14747–14756....
  • B. Zhao, K.R. Mopuri, H. Bilen, idlg: Improved deep leakage from gradients, CoRR abs/2001.02610....
  • L. Melis et al.

    Exploiting unintended feature leakage in collaborative learning

  • K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H.B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth, Practical...
  • X. Zhang, S. Ji, H. Wang, T. Wang, Private, yet practical, multiparty deep learning, in: ICDCS, IEEE Computer Society,...
  • K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H.B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth, Practical...
  • F. Wang, H. Zhu, R. Lu, Y. Zheng, H. Li, Achieve efficient and privacy-preserving disease risk assessment over...
  • R. Shokri, V. Shmatikov, Privacy-preserving deep learning, in: CCS, ACM, 2015, pp. 1310–1321....
  • Cited by (18)

    View all citing articles on Scopus
    View full text