skip to main content
10.1145/3292500.3330765acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

FDML: A Collaborative Machine Learning Framework for Distributed Features

Published:25 July 2019Publication History

ABSTRACT

Most current distributed machine learning systems try to scale up model training by using a data-parallel architecture that divides the computation for different samples among workers. We study distributed machine learning from a different motivation, where the information about the same samples, e.g., users and objects, are owned by several parities that wish to collaborate but do not want to share raw data with each other.

We propose an asynchronous stochastic gradient descent (SGD) algorithm for such a feature distributed machine learning (FDML) problem, to jointly learn from distributed features, with theoretical convergence guarantees under bounded asynchrony. Our algorithm does not require sharing the original features or even local model parameters between parties, thus preserving the data locality. The system can also easily incorporate differential privacy mechanisms to preserve a higher level of privacy. We implement the FDML system in a parameter server architecture and compare our system with fully centralized learning (which violates data locality) and learning based on only local features, through extensive experiments performed on both a public data set a9a, and a large dataset of 5,000,000 records and 8700 decentralized features from three collaborating apps at Tencent including Tencent MyApp, Tecent QQ Browser and Tencent Mobile Safeguard. Experimental results have demonstrated that the proposed FDML system can be used to significantly enhance app recommendation in Tencent MyApp by leveraging user and item features from other apps, while preserving the locality and privacy of features in each individual app to a high degree.

Skip Supplemental Material Section

Supplemental Material

p2232-hu.mp4

mp4

981.3 MB

References

  1. Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. 2016. Tensorflow: A System for Large-Scale Machine Learning. In Proc. USENIX Symposium on Operating System Design and Implementation (OSDI) . Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. arXiv preprint arXiv:1802.09941 (2018).Google ScholarGoogle Scholar
  3. Charlotte Bonte and Frederik Vercauteren. 2018. Privacy-Preserving Logistic Regression Training . Technical Report. IACR Cryptology ePrint Archive 233.Google ScholarGoogle Scholar
  4. Joseph K Bradley, Aapo Kyrola, Danny Bickson, and Carlos Guestrin. 2011. Parallel coordinate descent for l1-regularized loss minimization. arXiv preprint arXiv:1105.5379 (2011).Google ScholarGoogle Scholar
  5. Trishul M Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an Efficient and Scalable Deep Learning Training System.. In OSDI , Vol. 14. 571--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et almbox. 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223--1231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, and Lin Xiao. 2012. Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research , Vol. 13, Jan (2012), 165--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  9. Cynthia Dwork. 2008. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation. Springer, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cynthia Dwork, Aaron Roth, et almbox. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science , Vol. 9, 3--4 (2014), 211--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. 2016. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning . 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Briland Hitaj, Giuseppe Ateniese, and Fernando Perez-Cruz. 2017. Deep models under the GAN: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 603--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In Advances in neural information processing systems. 1223--1231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R Ganger, Phillip B Gibbons, and Onur Mutlu. 2017. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds.. In NSDI. 629--647. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John Langford, Alexander J Smola, and Martin Zinkevich. 2009. Slow learners are fast. Advances in Neural Information Processing Systems , Vol. 22 (2009), 2331--2339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Seunghak Lee, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A Gibson, and Eric P Xing. 2014. On model parallelization and scheduling strategies for distributed machine learning. In Advances in neural information processing systems. 2834--2842. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014a. Scaling Distributed Machine Learning with the Parameter Server.. In OSDI , Vol. 14. 583--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mu Li, David G Andersen, Alexander J Smola, and Kai Yu. 2014b. Communication efficient distributed machine learning with the parameter server. In Advances in Neural Information Processing Systems. 19--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mu Li, Ziqi Liu, Alexander J Smola, and Yu-Xiang Wang. 2016. Difacto: Distributed factorization machines. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 377--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Advances in Neural Information Processing Systems. 2737--2745. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, et almbox. 2016. Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629 (2016).Google ScholarGoogle Scholar
  22. Payman Mohassel and Yupeng Zhang. 2017. SecureML: A system for scalable privacy-preserving machine learning. In 2017 38th IEEE Symposium on Security and Privacy (SP). IEEE, 19--38.Google ScholarGoogle ScholarCross RefCross Ref
  23. Manas Pathak, Shantanu Rane, and Bhiksha Raj. 2010. Multiparty differential privacy via aggregation of locally trained classifiers. In Advances in Neural Information Processing Systems. 1876--1884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Arun Rajkumar and Shivani Agarwal. 2012. A differentially private stochastic gradient descent algorithm for multiparty classification. In Artificial Intelligence and Statistics . 933--941.Google ScholarGoogle Scholar
  25. Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems. 693--701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Chad Scherrer, Ambuj Tewari, Mahantesh Halappanavar, and David Haglin. 2012. Feature clustering for accelerating parallel coordinate descent. In Advances in Neural Information Processing Systems. 28--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. ACM, 1310--1321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hassan Takabi, Ehsan Hesamifard, and Mehdi Ghasemi. 2016. Privacy preserving multi-party machine learning with homomorphic encryption. In 29th Annual Conference on Neural Information Processing Systems (NIPS) .Google ScholarGoogle Scholar
  29. Li Wan, Wee Keong Ng, Shuguo Han, and Vincent Lee. 2007. Privacy-preservation for gradient descent methods. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 775--783. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Eric P Xing, Qirong Ho, Pengtao Xie, and Dai Wei. 2016. Strategies and principles of distributed machine learning on big data. Engineering , Vol. 2, 2 (2016), 179--195.Google ScholarGoogle ScholarCross RefCross Ref
  31. Yi Zhou, Yaoliang Yu, Wei Dai, Yingbin Liang, and Eric Xing. 2016. On convergence of model parallel proximal gradient algorithm for stale synchronous parallel system. In Artificial Intelligence and Statistics . 713--722.Google ScholarGoogle Scholar
  32. Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J Smola. 2010. Parallelized stochastic gradient descent. In Advances in neural information processing systems. 2595--2603. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FDML: A Collaborative Machine Learning Framework for Distributed Features

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader