skip to main content
10.1145/3492324.3494167acmconferencesArticle/Chapter ViewAbstractPublication PagesbdcatConference Proceedingsconference-collections
research-article
Public Access

A Proactive Data-Parallel Framework for Machine Learning

Published: 13 January 2022 Publication History

Abstract

Data parallel frameworks become essential for training machine learning models. The classic Bulk Synchronous Parallel (BSP) model updates the model parameters through pre-defined synchronization barriers. However, when a worker computes significantly slower than other workers, waiting for the slow worker will lead to excessive waste of computing resources. In this paper, we propose a novel proactive data-parallel (PDP) framework. PDP enables the parameter server to initiate the update of the model parameter. That is, we can perform the update at any time without pre-defined update points. PDP not only initiates the update but also determines when to update. The global decision on the frequency of updates will accelerate the training. We further propose asynchronous PDP to reduce the idle time caused by synchronizing parameter updates. We theoretically prove the convergence property of asynchronous PDP. We implement a distributed PDP framework and evaluate PDP with several popular machine learning algorithms including Multilayer Perceptron, Convolutional Neural Network, K-means, and Gaussian Mixture Model. Our evaluation shows that PDP can achieve up to 20X speedup over the BSP model and scale to large clusters.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265–283.
[2]
Haim Avron, Alex Druinsky, and Anshul Gupta. 2015. Revisiting asynchronous linear solvers: Provable convergence rate through randomization. Journal of the ACM (JACM) 62, 6 (2015), 1–27.
[3]
Lukas Balles, Javier Romero, and Philipp Hennig. 2016. Coupling adaptive batch sizes with learning rates. arXiv preprint arXiv:1612.05086(2016).
[4]
Léon Bottou, Frank E Curtis, and Jorge Nocedal. 2018. Optimization methods for large-scale machine learning. Siam Review 60, 2 (2018), 223–311.
[5]
Thorsten Brants, Ashok C Popat, Peng Xu, Franz J Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 858–867.
[6]
Richard H Byrd, Gillian M Chin, Jorge Nocedal, and Yuchen Wu. 2012. Sample size selection in optimization methods for machine learning. Mathematical programming 134, 1 (2012), 127–155.
[7]
James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R Ganger, Garth Gibson, Kimberly Keeton, and Eric Xing. 2013. Solving the straggler problem with bounded staleness. In 14th Workshop on Hot Topics in Operating Systems.
[8]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Ranzato, Ke Yang, 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223–1231.
[9]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107–113.
[10]
Aditya Devarakonda, Maxim Naumov, and Michael Garland. 2017. Adabatch: Adaptive batch sizes for training deep neural networks. arXiv preprint arXiv:1712.02029(2017).
[11]
Sanghamitra Dutta, Gauri Joshi, Soumyadip Ghosh, Parijat Dube, and Priya Nagpurkar. 2018. Slow and stale gradients can win the race: Error-runtime trade-offs in distributed SGD. arXiv preprint arXiv:1803.01113(2018).
[12]
Edgar Gabriel, Graham E Fagg, 2004. Open MPI: Goals, concept, and design of a next generation MPI implementation. In European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting. Springer, 97–104.
[13]
Aaron Harlap, Henggang Cui, Eric P Xing, and etc. 2016. Addressing the straggler problem for iterative convergent parallel ML. In Proceedings of the Seventh ACM Symposium on Cloud Computing. ACM, 98–111.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[15]
Qirong Ho, James Cipar, Henggang Cui, Eric P Xing, and etc. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In Advances in neural information processing systems. 1223–1231.
[16]
Rolf Jagerman and Carsten Eickhoff. 2016. Web-scale topic models in spark: An asynchronous parameter server. arXiv preprint arXiv:1605.07422(2016).
[17]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.
[18]
Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14). 583–598.
[19]
Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. 2014. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD. ACM, 661–670.
[20]
Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Advances in Neural Information Processing Systems. 2737–2745.
[21]
Luo Mai, Chuntao Hong, and Paolo Costa. 2015. Optimizing network performance in distributed machine learning. In 7th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 15).
[22]
Radford M Neal and Geoffrey E Hinton. 1998. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models. Springer, 355–368.
[23]
Robert J Orr and Gregory D Abowd. 2000. The smart floor: A mechanism for natural user identification and tracking. In CHI’00 extended abstracts on Human factors in computing systems. ACM, 275–276.
[24]
Aurick Qiao, Abutalib Aghayev, Weiren Yu, Haoyang Chen, Qirong Ho, Garth A Gibson, and Eric P Xing. 2018. Litz: Elastic framework for high-performance distributed machine learning. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18). 631–644.
[25]
Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems. 693–701.
[26]
Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. 2013. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229(2013).
[27]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. Ieee, 1–10.
[28]
Saeed Soori, Bugra Can, Mert Gurbuzbalaban, and Maryam Mehri Dehnavi. 2019. ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning. arXiv preprint arXiv:1907.08526(2019).
[29]
Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.
[30]
Zhigang Wang, Lixin Gao, Yu Gu, Yubin Bao, and Ge Yu. 2017. FSP: towards flexible synchronous parallel framework for expectation-maximization based algorithms on cloud. In Proceedings of the 2017 Symposium on Cloud Computing.
[31]
Eric P Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1, 2 (2015), 49–67.
[32]
Jiangtao Yin, Yanfeng Zhang, and Lixin Gao. 2012. Accelerating expectation-maximization algorithms with frequent updates. In 2012 IEEE International Conference on Cluster Computing. IEEE, 275–283.
[33]
Reza Zafarani and Huan Liu. 2017. User identification across social media. US Patent 9,544,381.
[34]
Guoyi Zhao, Lixin Gao, and David Irwin. 2018. Sync-on-the-fly: A parallel framework for gradient descent algorithms on transient resources. In IEEE International Conference on Big Data. IEEE, 392–397.

Cited By

View all
  • (2024)RoleML: a Role-Oriented Programming Model for Customizable Distributed Machine Learning on EdgesProceedings of the 25th International Middleware Conference10.1145/3652892.3700765(279-291)Online publication date: 2-Dec-2024
  • (2023)FSP: Towards Flexible Synchronous Parallel Frameworks for Distributed Machine LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322873334:2(687-703)Online publication date: 1-Feb-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BDCAT '21: Proceedings of the 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies
December 2021
133 pages
ISBN:9781450391641
DOI:10.1145/3492324
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. asynchronous distributed computation
  2. expectation-maximization
  3. gradient descent
  4. machine learning
  5. stragglers

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

BDCAT '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 27 of 93 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)17
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)RoleML: a Role-Oriented Programming Model for Customizable Distributed Machine Learning on EdgesProceedings of the 25th International Middleware Conference10.1145/3652892.3700765(279-291)Online publication date: 2-Dec-2024
  • (2023)FSP: Towards Flexible Synchronous Parallel Frameworks for Distributed Machine LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322873334:2(687-703)Online publication date: 1-Feb-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media