ABSTRACT
Federated learning was proposed with an intriguing vision of achieving collaborative machine learning among numerous clients without uploading their private data to a cloud server. However, the conventional framework requires each client to leverage the full model for learning, which can be prohibitively inefficient for large-scale learning tasks and resource-constrained mobile devices. Thus, we proposed a submodel framework, where clients download only the needed parts of the full model, namely, submodels, and then upload the submodel updates. Nevertheless, the "position" of a client's truly required submodel corresponds to its private data, while the disclosure of the true position to the cloud server during interactions inevitably breaks the tenet of federated learning. To integrate efficiency and privacy, we designed a secure federated submodel learning scheme coupled with a private set union protocol as a cornerstone. The secure scheme features the properties of randomized response, secure aggregation, and Bloom filter, and endows each client with customized plausible deniability (in terms of local differential privacy) against the position of its desired submodel, thereby protecting private data. We further instantiated the scheme with Alibaba's e-commerce recommendation, implemented a prototype system, and extensively evaluated over 30-day Taobao user data. Empirical results demonstrate the feasibility and scalability of the proposed scheme as well as its remarkable advantages over the conventional federated learning framework, from model accuracy and convergency, practical communication, computation, and storage overhead.
- Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proc. of CCS. ACM, 308--318.Google ScholarDigital Library
- Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, and H. Brendan McMahan. 2018. cpSGD: Communication-efficient and differentially-private distributed SGD. In Proc. of NeurIPS. 7575--7586.Google Scholar
- Alimama. 2017. Ad Display/Click Data on Taobao.com. https://tianchi.aliyun.com/dataset/dataDetail?dataId=56.Google Scholar
- Sebastian Angel, Hao Chen, Kim Laine, and Srinath Setty. 2018. PIR with Compressed Queries and Amortized Query Processing. In Proc. of S&P. IEEE, 962--979.Google ScholarCross Ref
- Apple's Differential Privacy Team. 2017. Learning with Privacy at Scale. Apple Machine Learning Journal 1, 8 (2017).Google Scholar
- Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. 2020. How To Backdoor Federated Learning. In Proc. of AISTATS. PMLR, 2938--2948.Google Scholar
- Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13, 7 (1970), 422--426.Google ScholarDigital Library
- Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proc. of CCS. ACM, 1175--1191.Google ScholarDigital Library
- Dan Boneh, Eu-Jin Goh, and Kobbi Nissim. 2005. Evaluating 2-DNF Formulas on Ciphertexts. In Proc. of TCC. Springer, 325--341.Google Scholar
- Andrei Broder and Michael Mitzenmacher. 2004. Network Applications of Bloom Filters: A Survey. Internet Mathematics 1, 4 (2004), 485--509.Google ScholarCross Ref
- Qingqing Cao, Noah Weber, Niranjan Balasubramanian, and Aruna Balasubramanian. 2019. DeQA: On-Device Question Answering. In Proc. of MobiSys. ACM, 27--40.Google ScholarDigital Library
- Hao Chen, Zhicong Huang, Kim Laine, and Peter Rindal. 2018. Labeled PSI from Fully Homomorphic Encryption with Malicious Security. In Proc. of CCS. ACM, 1223--1237.Google ScholarDigital Library
- Mingqing Chen, Rajiv Mathews, Tom Ouyang, and Françoise Beaufays. 2019. Federated Learning Of Out-Of-Vocabulary Words. arXiv: 1903.10635. http://arxiv.org/abs/1903.10635.Google Scholar
- Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba. arXiv:1905.06874. http://arxiv.org/abs/1905.06874.Google Scholar
- Alex Davidson and Carlos Cid. 2017. An Efficient Toolkit for Computing Private Set Operations. In Proc. of ACISP. Springer, 261--278.Google Scholar
- Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. In Proc. of NeurIPS. 3574--3583.Google Scholar
- Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science 9, 3--4 (2014), 211--407.Google ScholarDigital Library
- Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity. In Proc. of SODA. ACM-SIAM, 2468--2479.Google Scholar
- Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proc. of CCS. ACM, 1054--1067.Google ScholarDigital Library
- European Parliament and Council of the European Union. 2016. The General Data Protection Regulation (EU) 2016/679 (GDPR). https://eur-lex.europa.eu/eli/reg/2016/679/oj. Took effect from May 25, 2018.Google Scholar
- Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson. 2016. Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries. Proceedings on Privacy Enhancing Technologies (PoPETs) 2016, 3 (2016), 41--61.Google ScholarCross Ref
- Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing. In Proc. of USENIX Security. 17--32.Google Scholar
- Keith Frikken. 2007. Privacy-Preserving Set Union. In Proc. of ACNS. Springer, 237--252.Google Scholar
- Andrew Hard, Kanishka Rao, Rajiv Mathews, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated Learning for Mobile Keyboard Prediction. arXiv: 1811.03604. http://arxiv.org/abs/1811.03604.Google Scholar
- Jeongdae Hong, Jung Woo Kim, Jihye Kim, Kunsoo Park, and Jung Hee Cheon. 2013. Constant-round privacy preserving multiset union. Bulletin of the Korean Mathematical Society 50, 6 (2013), 1799--1816.Google ScholarCross Ref
- Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konecný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2019. Advances and Open Problems in Federated Learning. arXiv: 1912.04977. http://arxiv.org/abs/1912.04977.Google Scholar
- Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2008. What Can We Learn Privately?. In Proc. of FOCS. IEEE, 531--540.Google ScholarDigital Library
- Lea Kissner and Dawn Xiaodong Song. 2005. Privacy-Preserving Set Operations. In Proc. of CRYPTO. Springer, 241--257.Google Scholar
- Vladimir Kolesnikov, Mike Rosulek, Ni Trieu, and Xiao Wang. 2019. Scalable Private Set Union from Symmetric-Key Techniques. IACR Cryptology ePrint Archive, Report 2019/776. https://eprint.iacr.org/2019/776.Google Scholar
- Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. arXiv: 1610.05492. http://arxiv.org/abs/1610.05492.Google Scholar
- Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing Magazine 37, 3 (2020), 50--60.Google ScholarCross Ref
- Fuyu Lv, Taiwei Jin, Changlong Yu, Fei Sun, Quan Lin, Keping Yang, and Wilfred Ng. 2019. SDM: Sequential Deep Matching Model for Online Large-scale Recommender System. In Proc. of CIKM. ACM, 2635--2643.Google ScholarDigital Library
- Dilip Many, Martin Burkhart, and Xenofontas Dimitropoulos. 2012. Fast private set operations with sepia. Technical Report TIK-Report No. 345. Communication Systems Group, ETH Zürich, Switzerland.Google Scholar
- H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proc. of AISTATS. PMLR, 1273--1282.Google Scholar
- H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In Proc. of ICLR. OpenReview.net.Google Scholar
- L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov. 2019. Exploiting Unintended Feature Leakage in Collaborative Learning. In Proc. of S&P. IEEE, 497--512.Google Scholar
- George A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM 38, 11 (1995), 39--41.Google ScholarDigital Library
- Atsuko Miyaji and Katsunari Shishido. 2018. Efficient and Quasi-accurate Multiparty Private Set Union. In Proc. of SMARTCOMP. IEEE, 309--314.Google Scholar
- Frederic Morin and Yoshua Bengio. 2005. Hierarchical Probabilistic Neural Network Language Model. In Proc. of AISTATS. Society for Artificial Intelligence and Statistics, 246--252.Google Scholar
- M. Nasr, R. Shokri, and A. Houmansadr. 2019. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In Proc. of S&P. IEEE, 1021--1035.Google Scholar
- Chaoyue Niu, Renjie Gu, Hongtao Lv, and Hejun Xiao. 2020. Source code for secure federated submodel learning. https://github.com/NiuChaoyue/Secure-Federated-Submodel-Learning.Google Scholar
- Chaoyue Niu, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv, Zhihua Wu, and Guihai Chen. 2019. Secure Federated Submodel Learning. arXiv: 1911.02254. http://arxiv.org/abs/1911.02254.Google Scholar
- Pascal Paillier. 1999. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Proc. of EUROCRYPT. Springer, 223--238.Google Scholar
- Sarvar Patel, Giuseppe Persiano, and Kevin Yeo. 2018. Private Stateful Information Retrieval. In Proc. of CCS. ACM, 1002--1019.Google ScholarDigital Library
- Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Françoise Beaufays. 2019. Federated Learning for Emoji Prediction in a Mobile Keyboard. arXiv:1906.04329. http://arxiv.org/abs/1906.04329.Google Scholar
- Jae Hong Seo, Jung Hee Cheon, and Jonathan Katz. 2012. Constant-Round Multiparty Private Set Union Using Reversed Laurent Series. In Proc. of PKC. Springer, 398--412.Google Scholar
- David Starobinski, Ari Trachtenberg, and Sachin Agarwal. 2003. Efficient PDA Synchronization. IEEE Transactions on Mobile Computing 2, 1 (2003), 40--51.Google ScholarDigital Library
- Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, and H. Brendan McMahan. 2017. Distributed Mean Estimation with Limited Communication. In Proc. of ICML. PMLR, 3329--3337.Google Scholar
- Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In Proc. of KDD. ACM, 839--848.Google ScholarDigital Library
- Tianhao Wang, Bolin Ding, Jingren Zhou, Cheng Hong, Zhicong Huang, Ninghui Li, and Somesh Jha. 2019. Answering Multi-Dimensional Analytical Queries under Local Differential Privacy. In Proc. of SIGMOD. ACM, 159--176.Google ScholarDigital Library
- Tianhao Wang, Ninghui Li, and Somesh Jha. 2018. Locally Differentially Private Frequent Itemset Mining. In Proc. of S&P. IEEE, 127--143.Google ScholarCross Ref
- Stanley L. Warner. 1965. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63--69.Google ScholarCross Ref
- Xiufeng Xie and Kyu-Han Kim. 2019. Source Compression with Bounded DNN Perception Loss for IoT Edge Computer Vision. In Proc. of MobiCom. ACM, 47:1--47:16.Google Scholar
- Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated Machine Learning: Concept and Applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 12:1--12:19.Google ScholarDigital Library
- Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep Interest Evolution Network for Click-Through Rate Prediction. In Proc. of AAAI. AAAI Press, 5941--5948.Google Scholar
- Guorui Zhou, Xiaoqiang Zhu, Chengru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click-Through Rate Prediction. In Proc. of KDD. ACM, 1059--1068.Google ScholarDigital Library
- Han Zhu, Daqing Chang, Ziru Xu, Pengye Zhang, Xiang Li, Jie He, Han Li, Jian Xu, and Kun Gai. 2019. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. In Proc. of NeurIPS. 3973--3982.Google Scholar
- Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai. 2018. Learning Tree-based Deep Model for Recommender Systems. In Proc. of KDD. ACM, 1079--1088.Google ScholarDigital Library
- Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep Leakage from Gradients. In Proc. of NeurIPS. 14774--14784.Google Scholar
Index Terms
- Billion-scale federated learning on mobile clients: a submodel design with tunable privacy
Recommendations
Privacy-preserving Federated Learning and its application to natural language processing
AbstractState-of-the-art edge devices are capable of not only inferring machine learning (ML) models but also training them on the device with local data. When this local data is sensitive, privacy becomes a crucial property that must be ...
Verifiable Secure Aggregation Protocol Under Federated Learning
Artificial Intelligence Security and PrivacyAbstractFederated learning is a new machine learning paradigm used for collaborative training models among multiple devices. In federated learning, multiple clients participate in model training locally and use decentralized learning methods to ensure the ...
Dordis: Efficient Federated Learning with Dropout-Resilient Differential Privacy
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsFederated learning (FL) is increasingly deployed among multiple clients to train a shared model over decentralized data. To address privacy concerns, FL systems need to safeguard the clients' data from disclosure during training and control data leakage ...
Comments