Privacy-preserving and communication-efficient federated learning in Internet of Things
Introduction
With the development of edge computing and Internet-of-Things (IoT) technologies, smart devices embedded with sensors and high-performance computing chips are widely deployed at the edge of the network to collect tremendous data. Under the centralized learning framework, these data are required to be uploaded to the central server for training machine learning models, so as to provide various intelligent services such as health assessment, financial risk prediction, personalized recommendation, etc. However, the data are often sensitive in many scenarios (Liu et al., 2020). Unrestricted use of them may cause serious privacy leakage, further hindering the development of intelligent services. Aimed at the above problems, federated learning (FL) has emerged as a novel distributed learning framework and received widespread attention, where a group of IoT devices (clients) collaboratively learn a shared model by exchanging model updates under the coordination of the central server, but without exposing their training data.
Despite its privacy-preserving attributes, recent works show that FL still faces the risk of indirect privacy leakage. For example, by observing the gradients transmitted during the training process, adversaries can initiate model inversion attacks to recover the face images in the dataset (Fredrikson et al., 2015), or perform membership inference attack to speculate on the health records of a particular patient (Salem et al., 2018). Especially, when the server is honest-but-curious (a common assumption in FL), it can infer private information from the received local updates. Therefore, privacy preservation in FL remains to be an important issue that needs to be strengthened with cryptographic techniques. In addition, FL still faces the challenge of high communication overhead. According to the FL framework, clients have to upload and download the whole gradient update in each round, which is composed of millions of parameters especially for deep learning models. Moreover, the training often requires thousands of rounds when FL is implemented on a large dataset, further resulting in the overall communication overhead above hundreds of gigabytes in size. This means FL is not feasible in IoT with limited bandwidth. Currently, there are some FL-related works (Horvath, Ho, Horvath, Sahu, Canini, Richtarik, Ma, Zhang, Chen, Shen, 2018) solving the privacy preservation or communication overhead issues separately, but few of researches address the two issues at the same time.
In this paper, a privacy-preserving and communication-efficient scheme for federated learning (PCFL) in IoT is proposed. Given a more practical threat model where some of the semi-honest parties may collude with each other, we present a novel privacy-preserving protocol to prevent indirect privacy leakage. Furthermore, we cut down the communication overhead by removing irrelevant local updates and quantizing gradients with state-of-the-art compression operators. Specifically, the main contributions are concluded as follows:
- 1.
Based on secret sharing and homomorphic encryption, we propose a novel privacy-preserving protocol to protect individual client’s privacy while also being robust to clients’ dropout. Moreover, PCFL adapts to various scenarios of collusion amongst semi-honest parties through the tunable parameter in secret sharing.
- 2.
We design a sparse bidirectional compression algorithm by integrating spatial sparsification with the state-of-the-art compression operators to reduce the communication cost dramatically. In particular, we prevent uploading irrelevant local updates that deviate from the collaborative convergence tendency to eliminate unnecessary communication.
- 3.
We compare-both theoretically and experimentally-PCFL with related works on two real-world datasets. Results show that PCFL outperforms the state-of-the-art methods by more than 2 in terms of communication efficiency, along with high model accuracy and slight loss of convergence rate.
The rest of the paper is organized as follows. We introduce related works in Section 2 and preliminaries in Section 3. The system model, security requirements and detailed scheme are shown in Section 4. Section 5 gives the security analysis, theoretical and experimental comparison of PCFL with state-of-the-art methods. We draw conclusions and outline the future work in Section 6.
Section snippets
Related work
Distributed machine learning based on stochastic gradient descent (SGD) has been studied a lot (Liu et al., 2019), of which the privacy issue has received more and more attention in recent years. Shokri and Shmatikov (2015) proposed a privacy-preserving deep learning approach based on distributed selective SGD, where each client only shares partial gradients to the central server instead of the original data. But adversaries can still utilize the shared gradients to infer private data of a
Threshold secret sharing
Secret sharing is an important cryptographic primitive that splits a secret into n shares and allocates them to n parties. The secret can be reconstructed only if no less than m shares are combined together while any individual share can reveal no information about the secret. Take one of the most widely used technique—Shamir’s secret sharing (Shamir, 1979) for example. It is based on Lagrange interpolation polynomial. Randomly select m-1 integers and construct a degree-(m-1)
Proposed scheme
Aimed at the challenge of indirect privacy leakage and high communication overhead of federated learning, we propose a privacy-preserving and communication-efficient scheme. We first describe the system model and security requirements, then present the construction of our scheme. The major symbols used in this paper are listed in Table 1.
Evaluation
In this section, we first analyze the privacy and correctness of our scheme, and then compare PCFL with several state-of-the-art privacy-preserving FL methods theoretically and experimentally.
Conclusion and future work
This paper proposes a privacy-preserving and communication-efficient scheme for federated learning in IoT, called PCFL. In order to reduce the communication bits, we design a sparse bidirectional compression algorithm to quantize the uploaded and downloaded gradients. This makes our scheme suitable for IoT scenario where the communication is expensive or the network bandwidth is limited. In addition, by combining secret sharing with lightweight homomorphic encryption, our scheme not only
CRediT authorship contribution statement
Chen Fang: Conceptualization, Methodology, Writing - original draft. Yuanbo Guo: Supervision, Funding acquisition, Project administration. Yongjin Hu: Writing - review & editing. Bowen Ma: Visualization, Investigation. Li Feng: Software, Validation. Anqi Yin: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work has been supported by the National Natural Science Foundation of China (Grant no. 61601515) and Foundation of Science and Technology on Information Assurance Laboratory (No. KJ-15-108).
Chen Fang received a master degree in software engineering from Information Engineering University, Henan, China, in 2018. He is currently pursuing the Ph.D. degree in Information Engineering University, Henan, China. His current research interests are privacy and security in AI.
References (24)
- et al.
Eastfly: efficient and secure ternary federated learning
Comput. Secur.
(2020) - et al.
Highly efficient federated learning with strong privacy preservation in cloud computing
Comput. Secur.
(2020) - et al.
Privacy preserving multi-party computation delegation for deep learning in cloud computing
Inf. Sci.
(2018) - et al.
Privacy-preserving deep learning via additively homomorphic encryption
IEEE Trans. Inf. Forensics Secur.
(2017) - et al.
FedOpt: towards communication efficiency and privacy preservation in federated learning
Appl. Sci.
(2020) A public key cryptosystem and a signature scheme based on discrete logarithms
IEEE Trans. Inf. Theory
(1985)- et al.
Model inversion attacks that exploit confidence information and basic countermeasures
Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
(2015) - et al.
Towards efficient and privacy-preserving federated deep learning
ICC 2019-2019 IEEE International Conference on Communications (ICC)
(2019) - Horvath, S., Ho, C.-Y., Horvath, L., Sahu, A. N., Canini, M., Richtarik, P., 2019. Natural compression for distributed...
- et al.
Gaia: geo-distributed machine learning approaching LAN speeds
14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17)
(2017)
Cited by (45)
Model aggregation techniques in federated learning: A comprehensive survey
2024, Future Generation Computer SystemsPrivacy-Preserving federated learning: An application for big data load forecast in buildings
2023, Computers and SecurityA robust analysis of adversarial attacks on federated learning environments
2023, Computer Standards and InterfacesPrivacy-preserving distributed deep learning via LWE-based Certificateless Additively Homomorphic Encryption (CAHE)
2023, Journal of Information Security and ApplicationsPrivacy-utility trades in crowdsourced signal map obfuscation
2022, Computer NetworksCitation Excerpt :Federated learning: Mobile crowdsourcing applications lend themselves to a federated learning implementation [63,64], which can provide some privacy for mobile users. Recent works show that federated learning could also leak user privacy [65–70]. However, it would be a reasonable solution for opt-in mobile users used to collect training data for the GAP privatizer and to estimate data distributions for the IT privatizer.
Chen Fang received a master degree in software engineering from Information Engineering University, Henan, China, in 2018. He is currently pursuing the Ph.D. degree in Information Engineering University, Henan, China. His current research interests are privacy and security in AI.
Yuanbo Guo is a professor at the Information Engineering University, Henan, China. He received the Ph.D. degree from Xidian University. His research interests include network defense, data mining, machine learning and AI security etc.
Yongjin Hu received his B.E., M.S. degree in software engineering from Harbin Institute of Technology in 2004 and 2006. He currently is a Ph.D. candidate in software engineering at Information Engineering University, Henan, China. He is also a lecturer in Information Engineering University, Henan. His research interests include cyber deception, data mining.
Bowen Ma is a lecturer in Information Engineering University, Henan, China. His current research interests are machine learning and data mining.
Li Feng received a master degree from Information Engineering University, Henan, China, in 2018. He currently is an engineer in Beijing Institute of Remote Sensing Information. His research interests are machine learning and data mining.
Anqi Yin currently is a Ph.D candidate in software engineering at Information Engineering University, Henan. Her research interests include cryptographic protocol, privacy preservation.