skip to main content
10.1145/3372224.3419188acmconferencesArticle/Chapter ViewAbstractPublication PagesmobicomConference Proceedingsconference-collections
research-article

Billion-scale federated learning on mobile clients: a submodel design with tunable privacy

Published:18 September 2020Publication History

ABSTRACT

Federated learning was proposed with an intriguing vision of achieving collaborative machine learning among numerous clients without uploading their private data to a cloud server. However, the conventional framework requires each client to leverage the full model for learning, which can be prohibitively inefficient for large-scale learning tasks and resource-constrained mobile devices. Thus, we proposed a submodel framework, where clients download only the needed parts of the full model, namely, submodels, and then upload the submodel updates. Nevertheless, the "position" of a client's truly required submodel corresponds to its private data, while the disclosure of the true position to the cloud server during interactions inevitably breaks the tenet of federated learning. To integrate efficiency and privacy, we designed a secure federated submodel learning scheme coupled with a private set union protocol as a cornerstone. The secure scheme features the properties of randomized response, secure aggregation, and Bloom filter, and endows each client with customized plausible deniability (in terms of local differential privacy) against the position of its desired submodel, thereby protecting private data. We further instantiated the scheme with Alibaba's e-commerce recommendation, implemented a prototype system, and extensively evaluated over 30-day Taobao user data. Empirical results demonstrate the feasibility and scalability of the proposed scheme as well as its remarkable advantages over the conventional federated learning framework, from model accuracy and convergency, practical communication, computation, and storage overhead.

References

  1. Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proc. of CCS. ACM, 308--318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, and H. Brendan McMahan. 2018. cpSGD: Communication-efficient and differentially-private distributed SGD. In Proc. of NeurIPS. 7575--7586.Google ScholarGoogle Scholar
  3. Alimama. 2017. Ad Display/Click Data on Taobao.com. https://tianchi.aliyun.com/dataset/dataDetail?dataId=56.Google ScholarGoogle Scholar
  4. Sebastian Angel, Hao Chen, Kim Laine, and Srinath Setty. 2018. PIR with Compressed Queries and Amortized Query Processing. In Proc. of S&P. IEEE, 962--979.Google ScholarGoogle ScholarCross RefCross Ref
  5. Apple's Differential Privacy Team. 2017. Learning with Privacy at Scale. Apple Machine Learning Journal 1, 8 (2017).Google ScholarGoogle Scholar
  6. Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. 2020. How To Backdoor Federated Learning. In Proc. of AISTATS. PMLR, 2938--2948.Google ScholarGoogle Scholar
  7. Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13, 7 (1970), 422--426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proc. of CCS. ACM, 1175--1191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dan Boneh, Eu-Jin Goh, and Kobbi Nissim. 2005. Evaluating 2-DNF Formulas on Ciphertexts. In Proc. of TCC. Springer, 325--341.Google ScholarGoogle Scholar
  10. Andrei Broder and Michael Mitzenmacher. 2004. Network Applications of Bloom Filters: A Survey. Internet Mathematics 1, 4 (2004), 485--509.Google ScholarGoogle ScholarCross RefCross Ref
  11. Qingqing Cao, Noah Weber, Niranjan Balasubramanian, and Aruna Balasubramanian. 2019. DeQA: On-Device Question Answering. In Proc. of MobiSys. ACM, 27--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hao Chen, Zhicong Huang, Kim Laine, and Peter Rindal. 2018. Labeled PSI from Fully Homomorphic Encryption with Malicious Security. In Proc. of CCS. ACM, 1223--1237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mingqing Chen, Rajiv Mathews, Tom Ouyang, and Françoise Beaufays. 2019. Federated Learning Of Out-Of-Vocabulary Words. arXiv: 1903.10635. http://arxiv.org/abs/1903.10635.Google ScholarGoogle Scholar
  14. Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba. arXiv:1905.06874. http://arxiv.org/abs/1905.06874.Google ScholarGoogle Scholar
  15. Alex Davidson and Carlos Cid. 2017. An Efficient Toolkit for Computing Private Set Operations. In Proc. of ACISP. Springer, 261--278.Google ScholarGoogle Scholar
  16. Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. In Proc. of NeurIPS. 3574--3583.Google ScholarGoogle Scholar
  17. Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science 9, 3--4 (2014), 211--407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity. In Proc. of SODA. ACM-SIAM, 2468--2479.Google ScholarGoogle Scholar
  19. Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proc. of CCS. ACM, 1054--1067.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. European Parliament and Council of the European Union. 2016. The General Data Protection Regulation (EU) 2016/679 (GDPR). https://eur-lex.europa.eu/eli/reg/2016/679/oj. Took effect from May 25, 2018.Google ScholarGoogle Scholar
  21. Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson. 2016. Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries. Proceedings on Privacy Enhancing Technologies (PoPETs) 2016, 3 (2016), 41--61.Google ScholarGoogle ScholarCross RefCross Ref
  22. Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing. In Proc. of USENIX Security. 17--32.Google ScholarGoogle Scholar
  23. Keith Frikken. 2007. Privacy-Preserving Set Union. In Proc. of ACNS. Springer, 237--252.Google ScholarGoogle Scholar
  24. Andrew Hard, Kanishka Rao, Rajiv Mathews, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated Learning for Mobile Keyboard Prediction. arXiv: 1811.03604. http://arxiv.org/abs/1811.03604.Google ScholarGoogle Scholar
  25. Jeongdae Hong, Jung Woo Kim, Jihye Kim, Kunsoo Park, and Jung Hee Cheon. 2013. Constant-round privacy preserving multiset union. Bulletin of the Korean Mathematical Society 50, 6 (2013), 1799--1816.Google ScholarGoogle ScholarCross RefCross Ref
  26. Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konecný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2019. Advances and Open Problems in Federated Learning. arXiv: 1912.04977. http://arxiv.org/abs/1912.04977.Google ScholarGoogle Scholar
  27. Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2008. What Can We Learn Privately?. In Proc. of FOCS. IEEE, 531--540.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lea Kissner and Dawn Xiaodong Song. 2005. Privacy-Preserving Set Operations. In Proc. of CRYPTO. Springer, 241--257.Google ScholarGoogle Scholar
  29. Vladimir Kolesnikov, Mike Rosulek, Ni Trieu, and Xiao Wang. 2019. Scalable Private Set Union from Symmetric-Key Techniques. IACR Cryptology ePrint Archive, Report 2019/776. https://eprint.iacr.org/2019/776.Google ScholarGoogle Scholar
  30. Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. arXiv: 1610.05492. http://arxiv.org/abs/1610.05492.Google ScholarGoogle Scholar
  31. Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing Magazine 37, 3 (2020), 50--60.Google ScholarGoogle ScholarCross RefCross Ref
  32. Fuyu Lv, Taiwei Jin, Changlong Yu, Fei Sun, Quan Lin, Keping Yang, and Wilfred Ng. 2019. SDM: Sequential Deep Matching Model for Online Large-scale Recommender System. In Proc. of CIKM. ACM, 2635--2643.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dilip Many, Martin Burkhart, and Xenofontas Dimitropoulos. 2012. Fast private set operations with sepia. Technical Report TIK-Report No. 345. Communication Systems Group, ETH Zürich, Switzerland.Google ScholarGoogle Scholar
  34. H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proc. of AISTATS. PMLR, 1273--1282.Google ScholarGoogle Scholar
  35. H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In Proc. of ICLR. OpenReview.net.Google ScholarGoogle Scholar
  36. L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov. 2019. Exploiting Unintended Feature Leakage in Collaborative Learning. In Proc. of S&P. IEEE, 497--512.Google ScholarGoogle Scholar
  37. George A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM 38, 11 (1995), 39--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Atsuko Miyaji and Katsunari Shishido. 2018. Efficient and Quasi-accurate Multiparty Private Set Union. In Proc. of SMARTCOMP. IEEE, 309--314.Google ScholarGoogle Scholar
  39. Frederic Morin and Yoshua Bengio. 2005. Hierarchical Probabilistic Neural Network Language Model. In Proc. of AISTATS. Society for Artificial Intelligence and Statistics, 246--252.Google ScholarGoogle Scholar
  40. M. Nasr, R. Shokri, and A. Houmansadr. 2019. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In Proc. of S&P. IEEE, 1021--1035.Google ScholarGoogle Scholar
  41. Chaoyue Niu, Renjie Gu, Hongtao Lv, and Hejun Xiao. 2020. Source code for secure federated submodel learning. https://github.com/NiuChaoyue/Secure-Federated-Submodel-Learning.Google ScholarGoogle Scholar
  42. Chaoyue Niu, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv, Zhihua Wu, and Guihai Chen. 2019. Secure Federated Submodel Learning. arXiv: 1911.02254. http://arxiv.org/abs/1911.02254.Google ScholarGoogle Scholar
  43. Pascal Paillier. 1999. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Proc. of EUROCRYPT. Springer, 223--238.Google ScholarGoogle Scholar
  44. Sarvar Patel, Giuseppe Persiano, and Kevin Yeo. 2018. Private Stateful Information Retrieval. In Proc. of CCS. ACM, 1002--1019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Françoise Beaufays. 2019. Federated Learning for Emoji Prediction in a Mobile Keyboard. arXiv:1906.04329. http://arxiv.org/abs/1906.04329.Google ScholarGoogle Scholar
  46. Jae Hong Seo, Jung Hee Cheon, and Jonathan Katz. 2012. Constant-Round Multiparty Private Set Union Using Reversed Laurent Series. In Proc. of PKC. Springer, 398--412.Google ScholarGoogle Scholar
  47. David Starobinski, Ari Trachtenberg, and Sachin Agarwal. 2003. Efficient PDA Synchronization. IEEE Transactions on Mobile Computing 2, 1 (2003), 40--51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, and H. Brendan McMahan. 2017. Distributed Mean Estimation with Limited Communication. In Proc. of ICML. PMLR, 3329--3337.Google ScholarGoogle Scholar
  49. Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In Proc. of KDD. ACM, 839--848.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Tianhao Wang, Bolin Ding, Jingren Zhou, Cheng Hong, Zhicong Huang, Ninghui Li, and Somesh Jha. 2019. Answering Multi-Dimensional Analytical Queries under Local Differential Privacy. In Proc. of SIGMOD. ACM, 159--176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Tianhao Wang, Ninghui Li, and Somesh Jha. 2018. Locally Differentially Private Frequent Itemset Mining. In Proc. of S&P. IEEE, 127--143.Google ScholarGoogle ScholarCross RefCross Ref
  52. Stanley L. Warner. 1965. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63--69.Google ScholarGoogle ScholarCross RefCross Ref
  53. Xiufeng Xie and Kyu-Han Kim. 2019. Source Compression with Bounded DNN Perception Loss for IoT Edge Computer Vision. In Proc. of MobiCom. ACM, 47:1--47:16.Google ScholarGoogle Scholar
  54. Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated Machine Learning: Concept and Applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 12:1--12:19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep Interest Evolution Network for Click-Through Rate Prediction. In Proc. of AAAI. AAAI Press, 5941--5948.Google ScholarGoogle Scholar
  56. Guorui Zhou, Xiaoqiang Zhu, Chengru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click-Through Rate Prediction. In Proc. of KDD. ACM, 1059--1068.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Han Zhu, Daqing Chang, Ziru Xu, Pengye Zhang, Xiang Li, Jie He, Han Li, Jian Xu, and Kun Gai. 2019. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. In Proc. of NeurIPS. 3973--3982.Google ScholarGoogle Scholar
  58. Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai. 2018. Learning Tree-based Deep Model for Recommender Systems. In Proc. of KDD. ACM, 1079--1088.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep Leakage from Gradients. In Proc. of NeurIPS. 14774--14784.Google ScholarGoogle Scholar

Index Terms

  1. Billion-scale federated learning on mobile clients: a submodel design with tunable privacy

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking
          April 2020
          621 pages
          ISBN:9781450370851
          DOI:10.1145/3372224

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 September 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate440of2,972submissions,15%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader