Elsevier

Computers & Security

Volume 103, April 2021, 102199
Computers & Security

Privacy-preserving and communication-efficient federated learning in Internet of Things

https://doi.org/10.1016/j.cose.2021.102199Get rights and content

Abstract

Aimed at the privacy leakage caused by collecting data from numerous Internet of Things (IoT) devices for centralized training, a novel distributed learning framework, namely federated learning, came into being, where devices train models collaboratively while leaving their private datasets locally. Although many schemes have been proposed about federated learning, they are still short in communications and privacy due to the limited network bandwidth and advanced privacy attacks. To address these challenges, we develop PCFL, a privacy-preserving and communication-efficient scheme for federated learning in IoT. PCFL is composed of three key components: (1) gradient spatial sparsification where irrelevant local updates that deviate from the collaborative convergence tendency are prevented from being uploaded; (2) bidirectional compression where computation-less compression operators are used to quantize the gradients both at the device-side and server-side; and (3) privacy-preserving protocol which integrates secret sharing with lightweight homomorphic encryption to protect the data privacy and resist against various collusion scenarios. We analyze the correctness and privacy of our scheme, and carry out theoretical and experimental comparison on two real-world datasets. Results show that PCFL outperforms the state-of-the-art methods by more than 2× in terms of communication efficiency, along with high model accuracy and marginal decreases in convergence rate.

Introduction

With the development of edge computing and Internet-of-Things (IoT) technologies, smart devices embedded with sensors and high-performance computing chips are widely deployed at the edge of the network to collect tremendous data. Under the centralized learning framework, these data are required to be uploaded to the central server for training machine learning models, so as to provide various intelligent services such as health assessment, financial risk prediction, personalized recommendation, etc. However, the data are often sensitive in many scenarios (Liu et al., 2020). Unrestricted use of them may cause serious privacy leakage, further hindering the development of intelligent services. Aimed at the above problems, federated learning (FL) has emerged as a novel distributed learning framework and received widespread attention, where a group of IoT devices (clients) collaboratively learn a shared model by exchanging model updates under the coordination of the central server, but without exposing their training data.

Despite its privacy-preserving attributes, recent works show that FL still faces the risk of indirect privacy leakage. For example, by observing the gradients transmitted during the training process, adversaries can initiate model inversion attacks to recover the face images in the dataset (Fredrikson et al., 2015), or perform membership inference attack to speculate on the health records of a particular patient (Salem et al., 2018). Especially, when the server is honest-but-curious (a common assumption in FL), it can infer private information from the received local updates. Therefore, privacy preservation in FL remains to be an important issue that needs to be strengthened with cryptographic techniques. In addition, FL still faces the challenge of high communication overhead. According to the FL framework, clients have to upload and download the whole gradient update in each round, which is composed of millions of parameters especially for deep learning models. Moreover, the training often requires thousands of rounds when FL is implemented on a large dataset, further resulting in the overall communication overhead above hundreds of gigabytes in size. This means FL is not feasible in IoT with limited bandwidth. Currently, there are some FL-related works (Horvath, Ho, Horvath, Sahu, Canini, Richtarik, Ma, Zhang, Chen, Shen, 2018) solving the privacy preservation or communication overhead issues separately, but few of researches address the two issues at the same time.

In this paper, a privacy-preserving and communication-efficient scheme for federated learning (PCFL) in IoT is proposed. Given a more practical threat model where some of the semi-honest parties may collude with each other, we present a novel privacy-preserving protocol to prevent indirect privacy leakage. Furthermore, we cut down the communication overhead by removing irrelevant local updates and quantizing gradients with state-of-the-art compression operators. Specifically, the main contributions are concluded as follows:

  • 1.

    Based on secret sharing and homomorphic encryption, we propose a novel privacy-preserving protocol to protect individual client’s privacy while also being robust to clients’ dropout. Moreover, PCFL adapts to various scenarios of collusion amongst semi-honest parties through the tunable parameter in secret sharing.

  • 2.

    We design a sparse bidirectional compression algorithm by integrating spatial sparsification with the state-of-the-art compression operators to reduce the communication cost dramatically. In particular, we prevent uploading irrelevant local updates that deviate from the collaborative convergence tendency to eliminate unnecessary communication.

  • 3.

    We compare-both theoretically and experimentally-PCFL with related works on two real-world datasets. Results show that PCFL outperforms the state-of-the-art methods by more than 2 ×  in terms of communication efficiency, along with high model accuracy and slight loss of convergence rate.

The rest of the paper is organized as follows. We introduce related works in Section 2 and preliminaries in Section 3. The system model, security requirements and detailed scheme are shown in Section 4. Section 5 gives the security analysis, theoretical and experimental comparison of PCFL with state-of-the-art methods. We draw conclusions and outline the future work in Section 6.

Section snippets

Related work

Distributed machine learning based on stochastic gradient descent (SGD) has been studied a lot (Liu et al., 2019), of which the privacy issue has received more and more attention in recent years. Shokri and Shmatikov (2015) proposed a privacy-preserving deep learning approach based on distributed selective SGD, where each client only shares partial gradients to the central server instead of the original data. But adversaries can still utilize the shared gradients to infer private data of a

Threshold secret sharing

Secret sharing is an important cryptographic primitive that splits a secret into n shares and allocates them to n parties. The secret can be reconstructed only if no less than m shares are combined together while any individual share can reveal no information about the secret. Take one of the most widely used technique—Shamir’s secret sharing (Shamir, 1979) for example. It is based on Lagrange interpolation polynomial. Randomly select m-1 integers a1,a2,,am1 and construct a degree-(m-1)

Proposed scheme

Aimed at the challenge of indirect privacy leakage and high communication overhead of federated learning, we propose a privacy-preserving and communication-efficient scheme. We first describe the system model and security requirements, then present the construction of our scheme. The major symbols used in this paper are listed in Table 1.

Evaluation

In this section, we first analyze the privacy and correctness of our scheme, and then compare PCFL with several state-of-the-art privacy-preserving FL methods theoretically and experimentally.

Conclusion and future work

This paper proposes a privacy-preserving and communication-efficient scheme for federated learning in IoT, called PCFL. In order to reduce the communication bits, we design a sparse bidirectional compression algorithm to quantize the uploaded and downloaded gradients. This makes our scheme suitable for IoT scenario where the communication is expensive or the network bandwidth is limited. In addition, by combining secret sharing with lightweight homomorphic encryption, our scheme not only

CRediT authorship contribution statement

Chen Fang: Conceptualization, Methodology, Writing - original draft. Yuanbo Guo: Supervision, Funding acquisition, Project administration. Yongjin Hu: Writing - review & editing. Bowen Ma: Visualization, Investigation. Li Feng: Software, Validation. Anqi Yin: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been supported by the National Natural Science Foundation of China (Grant no. 61601515) and Foundation of Science and Technology on Information Assurance Laboratory (No. KJ-15-108).

Chen Fang received a master degree in software engineering from Information Engineering University, Henan, China, in 2018. He is currently pursuing the Ph.D. degree in Information Engineering University, Henan, China. His current research interests are privacy and security in AI.

References (24)

  • Y. Dong et al.

    Eastfly: efficient and secure ternary federated learning

    Comput. Secur.

    (2020)
  • C. Fang et al.

    Highly efficient federated learning with strong privacy preservation in cloud computing

    Comput. Secur.

    (2020)
  • X. Ma et al.

    Privacy preserving multi-party computation delegation for deep learning in cloud computing

    Inf. Sci.

    (2018)
  • Y. Aono et al.

    Privacy-preserving deep learning via additively homomorphic encryption

    IEEE Trans. Inf. Forensics Secur.

    (2017)
  • M. Asad et al.

    FedOpt: towards communication efficiency and privacy preservation in federated learning

    Appl. Sci.

    (2020)
  • T. ElGamal

    A public key cryptosystem and a signature scheme based on discrete logarithms

    IEEE Trans. Inf. Theory

    (1985)
  • M. Fredrikson et al.

    Model inversion attacks that exploit confidence information and basic countermeasures

    Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

    (2015)
  • M. Hao et al.

    Towards efficient and privacy-preserving federated deep learning

    ICC 2019-2019 IEEE International Conference on Communications (ICC)

    (2019)
  • Horvath, S., Ho, C.-Y., Horvath, L., Sahu, A. N., Canini, M., Richtarik, P., 2019. Natural compression for distributed...
  • K. Hsieh et al.

    Gaia: geo-distributed machine learning approaching {LAN} speeds

    14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17)

    (2017)
  • Hu, R., Gong, Y., Guo, Y., 2020. Cpfed: communication-efficient and privacy-preserving federated learning. arXiv...
  • Li, T., Liu, Z., Sekar, V., Smith, V., 2019. Privacy for free: communication-efficient learning with differential...
  • Cited by (45)

    • Privacy-utility trades in crowdsourced signal map obfuscation

      2022, Computer Networks
      Citation Excerpt :

      Federated learning: Mobile crowdsourcing applications lend themselves to a federated learning implementation [63,64], which can provide some privacy for mobile users. Recent works show that federated learning could also leak user privacy [65–70]. However, it would be a reasonable solution for opt-in mobile users used to collect training data for the GAP privatizer and to estimate data distributions for the IT privatizer.

    View all citing articles on Scopus

    Chen Fang received a master degree in software engineering from Information Engineering University, Henan, China, in 2018. He is currently pursuing the Ph.D. degree in Information Engineering University, Henan, China. His current research interests are privacy and security in AI.

    Yuanbo Guo is a professor at the Information Engineering University, Henan, China. He received the Ph.D. degree from Xidian University. His research interests include network defense, data mining, machine learning and AI security etc.

    Yongjin Hu received his B.E., M.S. degree in software engineering from Harbin Institute of Technology in 2004 and 2006. He currently is a Ph.D. candidate in software engineering at Information Engineering University, Henan, China. He is also a lecturer in Information Engineering University, Henan. His research interests include cyber deception, data mining.

    Bowen Ma is a lecturer in Information Engineering University, Henan, China. His current research interests are machine learning and data mining.

    Li Feng received a master degree from Information Engineering University, Henan, China, in 2018. He currently is an engineer in Beijing Institute of Remote Sensing Information. His research interests are machine learning and data mining.

    Anqi Yin currently is a Ph.D candidate in software engineering at Information Engineering University, Henan. Her research interests include cryptographic protocol, privacy preservation.

    View full text