research-article

Billion-scale federated learning on mobile clients: a submodel design with tunable privacy

Authors:
Chaoyue Niu

Shanghai Jiao Tong University, China

Shanghai Jiao Tong University, China
View Profile

,
Fan Wu

Shanghai Jiao Tong University, China

Shanghai Jiao Tong University, China
View Profile

,
Shaojie Tang

University of Texas at Dallas

University of Texas at Dallas
View Profile

,
Lifeng Hua

Alibaba Group, China

Alibaba Group, China
View Profile

,
Rongfei Jia

Alibaba Group, China

Alibaba Group, China
View Profile

,
Chengfei Lv

Alibaba Group, China

Alibaba Group, China
View Profile

,
Zhihua Wu

Alibaba Group, China

Alibaba Group, China
View Profile

,
Guihai Chen

Shanghai Jiao Tong University, China

Shanghai Jiao Tong University, China
View Profile

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and NetworkingApril 2020Article No.: 31Pages 1–14https://doi.org/10.1145/3372224.3419188

Published:18 September 2020Publication History

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking

Pages 1–14

ABSTRACT

Federated learning was proposed with an intriguing vision of achieving collaborative machine learning among numerous clients without uploading their private data to a cloud server. However, the conventional framework requires each client to leverage the full model for learning, which can be prohibitively inefficient for large-scale learning tasks and resource-constrained mobile devices. Thus, we proposed a submodel framework, where clients download only the needed parts of the full model, namely, submodels, and then upload the submodel updates. Nevertheless, the "position" of a client's truly required submodel corresponds to its private data, while the disclosure of the true position to the cloud server during interactions inevitably breaks the tenet of federated learning. To integrate efficiency and privacy, we designed a secure federated submodel learning scheme coupled with a private set union protocol as a cornerstone. The secure scheme features the properties of randomized response, secure aggregation, and Bloom filter, and endows each client with customized plausible deniability (in terms of local differential privacy) against the position of its desired submodel, thereby protecting private data. We further instantiated the scheme with Alibaba's e-commerce recommendation, implemented a prototype system, and extensively evaluated over 30-day Taobao user data. Empirical results demonstrate the feasibility and scalability of the proposed scheme as well as its remarkable advantages over the conventional federated learning framework, from model accuracy and convergency, practical communication, computation, and storage overhead.

References

Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proc. of CCS. ACM, 308--318.Google ScholarDigital Library
Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, and H. Brendan McMahan. 2018. cpSGD: Communication-efficient and differentially-private distributed SGD. In Proc. of NeurIPS. 7575--7586.Google Scholar
Alimama. 2017. Ad Display/Click Data on Taobao.com. https://tianchi.aliyun.com/dataset/dataDetail?dataId=56.Google Scholar
Sebastian Angel, Hao Chen, Kim Laine, and Srinath Setty. 2018. PIR with Compressed Queries and Amortized Query Processing. In Proc. of S&P. IEEE, 962--979.Google ScholarCross Ref
Apple's Differential Privacy Team. 2017. Learning with Privacy at Scale. Apple Machine Learning Journal 1, 8 (2017).Google Scholar
Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. 2020. How To Backdoor Federated Learning. In Proc. of AISTATS. PMLR, 2938--2948.Google Scholar
Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13, 7 (1970), 422--426.Google ScholarDigital Library
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proc. of CCS. ACM, 1175--1191.Google ScholarDigital Library
Dan Boneh, Eu-Jin Goh, and Kobbi Nissim. 2005. Evaluating 2-DNF Formulas on Ciphertexts. In Proc. of TCC. Springer, 325--341.Google Scholar
Andrei Broder and Michael Mitzenmacher. 2004. Network Applications of Bloom Filters: A Survey. Internet Mathematics 1, 4 (2004), 485--509.Google ScholarCross Ref
Qingqing Cao, Noah Weber, Niranjan Balasubramanian, and Aruna Balasubramanian. 2019. DeQA: On-Device Question Answering. In Proc. of MobiSys. ACM, 27--40.Google ScholarDigital Library
Hao Chen, Zhicong Huang, Kim Laine, and Peter Rindal. 2018. Labeled PSI from Fully Homomorphic Encryption with Malicious Security. In Proc. of CCS. ACM, 1223--1237.Google ScholarDigital Library
Mingqing Chen, Rajiv Mathews, Tom Ouyang, and Françoise Beaufays. 2019. Federated Learning Of Out-Of-Vocabulary Words. arXiv: 1903.10635. http://arxiv.org/abs/1903.10635.Google Scholar
Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba. arXiv:1905.06874. http://arxiv.org/abs/1905.06874.Google Scholar
Alex Davidson and Carlos Cid. 2017. An Efficient Toolkit for Computing Private Set Operations. In Proc. of ACISP. Springer, 261--278.Google Scholar
Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. In Proc. of NeurIPS. 3574--3583.Google Scholar
Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science 9, 3--4 (2014), 211--407.Google ScholarDigital Library
Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity. In Proc. of SODA. ACM-SIAM, 2468--2479.Google Scholar
Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proc. of CCS. ACM, 1054--1067.Google ScholarDigital Library
European Parliament and Council of the European Union. 2016. The General Data Protection Regulation (EU) 2016/679 (GDPR). https://eur-lex.europa.eu/eli/reg/2016/679/oj. Took effect from May 25, 2018.Google Scholar
Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson. 2016. Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries. Proceedings on Privacy Enhancing Technologies (PoPETs) 2016, 3 (2016), 41--61.Google ScholarCross Ref
Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing. In Proc. of USENIX Security. 17--32.Google Scholar
Keith Frikken. 2007. Privacy-Preserving Set Union. In Proc. of ACNS. Springer, 237--252.Google Scholar
Andrew Hard, Kanishka Rao, Rajiv Mathews, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated Learning for Mobile Keyboard Prediction. arXiv: 1811.03604. http://arxiv.org/abs/1811.03604.Google Scholar
Jeongdae Hong, Jung Woo Kim, Jihye Kim, Kunsoo Park, and Jung Hee Cheon. 2013. Constant-round privacy preserving multiset union. Bulletin of the Korean Mathematical Society 50, 6 (2013), 1799--1816.Google ScholarCross Ref
Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konecný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2019. Advances and Open Problems in Federated Learning. arXiv: 1912.04977. http://arxiv.org/abs/1912.04977.Google Scholar
Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2008. What Can We Learn Privately?. In Proc. of FOCS. IEEE, 531--540.Google ScholarDigital Library
Lea Kissner and Dawn Xiaodong Song. 2005. Privacy-Preserving Set Operations. In Proc. of CRYPTO. Springer, 241--257.Google Scholar
Vladimir Kolesnikov, Mike Rosulek, Ni Trieu, and Xiao Wang. 2019. Scalable Private Set Union from Symmetric-Key Techniques. IACR Cryptology ePrint Archive, Report 2019/776. https://eprint.iacr.org/2019/776.Google Scholar
Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. arXiv: 1610.05492. http://arxiv.org/abs/1610.05492.Google Scholar
Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing Magazine 37, 3 (2020), 50--60.Google ScholarCross Ref
Fuyu Lv, Taiwei Jin, Changlong Yu, Fei Sun, Quan Lin, Keping Yang, and Wilfred Ng. 2019. SDM: Sequential Deep Matching Model for Online Large-scale Recommender System. In Proc. of CIKM. ACM, 2635--2643.Google ScholarDigital Library
Dilip Many, Martin Burkhart, and Xenofontas Dimitropoulos. 2012. Fast private set operations with sepia. Technical Report TIK-Report No. 345. Communication Systems Group, ETH Zürich, Switzerland.Google Scholar
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proc. of AISTATS. PMLR, 1273--1282.Google Scholar
H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In Proc. of ICLR. OpenReview.net.Google Scholar
L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov. 2019. Exploiting Unintended Feature Leakage in Collaborative Learning. In Proc. of S&P. IEEE, 497--512.Google Scholar
George A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM 38, 11 (1995), 39--41.Google ScholarDigital Library
Atsuko Miyaji and Katsunari Shishido. 2018. Efficient and Quasi-accurate Multiparty Private Set Union. In Proc. of SMARTCOMP. IEEE, 309--314.Google Scholar
Frederic Morin and Yoshua Bengio. 2005. Hierarchical Probabilistic Neural Network Language Model. In Proc. of AISTATS. Society for Artificial Intelligence and Statistics, 246--252.Google Scholar
M. Nasr, R. Shokri, and A. Houmansadr. 2019. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In Proc. of S&P. IEEE, 1021--1035.Google Scholar
Chaoyue Niu, Renjie Gu, Hongtao Lv, and Hejun Xiao. 2020. Source code for secure federated submodel learning. https://github.com/NiuChaoyue/Secure-Federated-Submodel-Learning.Google Scholar
Chaoyue Niu, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv, Zhihua Wu, and Guihai Chen. 2019. Secure Federated Submodel Learning. arXiv: 1911.02254. http://arxiv.org/abs/1911.02254.Google Scholar
Pascal Paillier. 1999. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Proc. of EUROCRYPT. Springer, 223--238.Google Scholar
Sarvar Patel, Giuseppe Persiano, and Kevin Yeo. 2018. Private Stateful Information Retrieval. In Proc. of CCS. ACM, 1002--1019.Google ScholarDigital Library
Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Françoise Beaufays. 2019. Federated Learning for Emoji Prediction in a Mobile Keyboard. arXiv:1906.04329. http://arxiv.org/abs/1906.04329.Google Scholar
Jae Hong Seo, Jung Hee Cheon, and Jonathan Katz. 2012. Constant-Round Multiparty Private Set Union Using Reversed Laurent Series. In Proc. of PKC. Springer, 398--412.Google Scholar
David Starobinski, Ari Trachtenberg, and Sachin Agarwal. 2003. Efficient PDA Synchronization. IEEE Transactions on Mobile Computing 2, 1 (2003), 40--51.Google ScholarDigital Library
Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, and H. Brendan McMahan. 2017. Distributed Mean Estimation with Limited Communication. In Proc. of ICML. PMLR, 3329--3337.Google Scholar
Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In Proc. of KDD. ACM, 839--848.Google ScholarDigital Library
Tianhao Wang, Bolin Ding, Jingren Zhou, Cheng Hong, Zhicong Huang, Ninghui Li, and Somesh Jha. 2019. Answering Multi-Dimensional Analytical Queries under Local Differential Privacy. In Proc. of SIGMOD. ACM, 159--176.Google ScholarDigital Library
Tianhao Wang, Ninghui Li, and Somesh Jha. 2018. Locally Differentially Private Frequent Itemset Mining. In Proc. of S&P. IEEE, 127--143.Google ScholarCross Ref
Stanley L. Warner. 1965. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63--69.Google ScholarCross Ref
Xiufeng Xie and Kyu-Han Kim. 2019. Source Compression with Bounded DNN Perception Loss for IoT Edge Computer Vision. In Proc. of MobiCom. ACM, 47:1--47:16.Google Scholar
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated Machine Learning: Concept and Applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 12:1--12:19.Google ScholarDigital Library
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep Interest Evolution Network for Click-Through Rate Prediction. In Proc. of AAAI. AAAI Press, 5941--5948.Google Scholar
Guorui Zhou, Xiaoqiang Zhu, Chengru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click-Through Rate Prediction. In Proc. of KDD. ACM, 1059--1068.Google ScholarDigital Library
Han Zhu, Daqing Chang, Ziru Xu, Pengye Zhang, Xiang Li, Jie He, Han Li, Jian Xu, and Kun Gai. 2019. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. In Proc. of NeurIPS. 3973--3982.Google Scholar
Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai. 2018. Learning Tree-based Deep Model for Recommender Systems. In Proc. of KDD. ACM, 1079--1088.Google ScholarDigital Library
Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep Leakage from Gradients. In Proc. of NeurIPS. 14774--14784.Google Scholar

Index Terms

Billion-scale federated learning on mobile clients: a submodel design with tunable privacy

Recommendations

Privacy-preserving Federated Learning and its application to natural language processing
Abstract
State-of-the-art edge devices are capable of not only inferring machine learning (ML) models but also training them on the device with local data. When this local data is sensitive, privacy becomes a crucial property that must be ...
Read More
Verifiable Secure Aggregation Protocol Under Federated Learning
Artificial Intelligence Security and Privacy
Abstract
Federated learning is a new machine learning paradigm used for collaborative training models among multiple devices. In federated learning, multiple clients participate in model training locally and use decentralized learning methods to ensure the ...
Read More
Dordis: Efficient Federated Learning with Dropout-Resilient Differential Privacy
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems

Federated learning (FL) is increasingly deployed among multiple clients to train a shared model over decentralized data. To address privacy concerns, FL systems need to safeguard the clients' data from disclosure during training and control data leakage ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking
April 2020
621 pages
ISBN:9781450370851
DOI:10.1145/3372224

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 September 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bloom filter
federated submodel learning
local differential privacy
private set union
randomized response
recommendation systems
secure aggregation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate440of2,972submissions,15%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 2,076
  Total Downloads
- Downloads (Last 12 months)386
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Billion-scale federated learning on mobile clients: a submodel design with tunable privacy

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking

ABSTRACT

References

Cited By

Index Terms

Recommendations

Privacy-preserving Federated Learning and its application to natural language processing

Verifiable Secure Aggregation Protocol Under Federated Learning

Dordis: Efficient Federated Learning with Dropout-Resilient Differential Privacy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Billion-scale federated learning on mobile clients: a submodel design with tunable privacy

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking

ABSTRACT

References

Cited By

Index Terms

Recommendations

Privacy-preserving Federated Learning and its application to natural language processing

Verifiable Secure Aggregation Protocol Under Federated Learning

Dordis: Efficient Federated Learning with Dropout-Resilient Differential Privacy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media