research-article

Public Access

A Proactive Data-Parallel Framework for Machine Learning

Authors:

Lixin GaoAuthors Info & Claims

BDCAT '21: Proceedings of the 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies

Pages 69 - 79

https://doi.org/10.1145/3492324.3494167

Published: 13 January 2022 Publication History

All formats PDF

Abstract

Data parallel frameworks become essential for training machine learning models. The classic Bulk Synchronous Parallel (BSP) model updates the model parameters through pre-defined synchronization barriers. However, when a worker computes significantly slower than other workers, waiting for the slow worker will lead to excessive waste of computing resources. In this paper, we propose a novel proactive data-parallel (PDP) framework. PDP enables the parameter server to initiate the update of the model parameter. That is, we can perform the update at any time without pre-defined update points. PDP not only initiates the update but also determines when to update. The global decision on the frequency of updates will accelerate the training. We further propose asynchronous PDP to reduce the idle time caused by synchronizing parameter updates. We theoretically prove the convergence property of asynchronous PDP. We implement a distributed PDP framework and evaluate PDP with several popular machine learning algorithms including Multilayer Perceptron, Convolutional Neural Network, K-means, and Gaussian Mixture Model. Our evaluation shows that PDP can achieve up to 20X speedup over the BSP model and scale to large clusters.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265–283.

[2]

Haim Avron, Alex Druinsky, and Anshul Gupta. 2015. Revisiting asynchronous linear solvers: Provable convergence rate through randomization. Journal of the ACM (JACM) 62, 6 (2015), 1–27.

Digital Library

[3]

Lukas Balles, Javier Romero, and Philipp Hennig. 2016. Coupling adaptive batch sizes with learning rates. arXiv preprint arXiv:1612.05086(2016).

[4]

Léon Bottou, Frank E Curtis, and Jorge Nocedal. 2018. Optimization methods for large-scale machine learning. Siam Review 60, 2 (2018), 223–311.

[5]

Thorsten Brants, Ashok C Popat, Peng Xu, Franz J Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 858–867.

[6]

Richard H Byrd, Gillian M Chin, Jorge Nocedal, and Yuchen Wu. 2012. Sample size selection in optimization methods for machine learning. Mathematical programming 134, 1 (2012), 127–155.

[7]

James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R Ganger, Garth Gibson, Kimberly Keeton, and Eric Xing. 2013. Solving the straggler problem with bounded staleness. In 14th Workshop on Hot Topics in Operating Systems.

[8]

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Ranzato, Ke Yang, 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223–1231.

[9]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107–113.

Digital Library

[10]

Aditya Devarakonda, Maxim Naumov, and Michael Garland. 2017. Adabatch: Adaptive batch sizes for training deep neural networks. arXiv preprint arXiv:1712.02029(2017).

[11]

Sanghamitra Dutta, Gauri Joshi, Soumyadip Ghosh, Parijat Dube, and Priya Nagpurkar. 2018. Slow and stale gradients can win the race: Error-runtime trade-offs in distributed SGD. arXiv preprint arXiv:1803.01113(2018).

[12]

Edgar Gabriel, Graham E Fagg, 2004. Open MPI: Goals, concept, and design of a next generation MPI implementation. In European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting. Springer, 97–104.

[13]

Aaron Harlap, Henggang Cui, Eric P Xing, and etc. 2016. Addressing the straggler problem for iterative convergent parallel ML. In Proceedings of the Seventh ACM Symposium on Cloud Computing. ACM, 98–111.

Digital Library

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[15]

Qirong Ho, James Cipar, Henggang Cui, Eric P Xing, and etc. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In Advances in neural information processing systems. 1223–1231.

[16]

Rolf Jagerman and Carsten Eickhoff. 2016. Web-scale topic models in spark: An asynchronous parameter server. arXiv preprint arXiv:1605.07422(2016).

[17]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.

[18]

Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14). 583–598.

[19]

Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. 2014. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD. ACM, 661–670.

Digital Library

[20]

Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Advances in Neural Information Processing Systems. 2737–2745.

[21]

Luo Mai, Chuntao Hong, and Paolo Costa. 2015. Optimizing network performance in distributed machine learning. In 7th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 15).

[22]

Radford M Neal and Geoffrey E Hinton. 1998. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models. Springer, 355–368.

[23]

Robert J Orr and Gregory D Abowd. 2000. The smart floor: A mechanism for natural user identification and tracking. In CHI’00 extended abstracts on Human factors in computing systems. ACM, 275–276.

[24]

Aurick Qiao, Abutalib Aghayev, Weiren Yu, Haoyang Chen, Qirong Ho, Garth A Gibson, and Eric P Xing. 2018. Litz: Elastic framework for high-performance distributed machine learning. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18). 631–644.

[25]

Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems. 693–701.

Digital Library

[26]

Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. 2013. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229(2013).

[27]

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. Ieee, 1–10.

[28]

Saeed Soori, Bugra Can, Mert Gurbuzbalaban, and Maryam Mehri Dehnavi. 2019. ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning. arXiv preprint arXiv:1907.08526(2019).

[29]

Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.

Digital Library

[30]

Zhigang Wang, Lixin Gao, Yu Gu, Yubin Bao, and Ge Yu. 2017. FSP: towards flexible synchronous parallel framework for expectation-maximization based algorithms on cloud. In Proceedings of the 2017 Symposium on Cloud Computing.

Digital Library

[31]

Eric P Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1, 2 (2015), 49–67.

[32]

Jiangtao Yin, Yanfeng Zhang, and Lixin Gao. 2012. Accelerating expectation-maximization algorithms with frequent updates. In 2012 IEEE International Conference on Cluster Computing. IEEE, 275–283.

Digital Library

[33]

Reza Zafarani and Huan Liu. 2017. User identification across social media. US Patent 9,544,381.

[34]

Guoyi Zhao, Lixin Gao, and David Irwin. 2018. Sync-on-the-fly: A parallel framework for gradient descent algorithms on transient resources. In IEEE International Conference on Big Data. IEEE, 392–397.

Cited By

Tan YYang LLi WWu YSchiavoni VEdinger JCao JJin Z(2024)RoleML: a Role-Oriented Programming Model for Customizable Distributed Machine Learning on EdgesProceedings of the 25th International Middleware Conference10.1145/3652892.3700765(279-291)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3652892.3700765
Wang ZTu YWang NGao LNie JWei ZGu YYu G(2023)FSP: Towards Flexible Synchronous Parallel Frameworks for Distributed Machine LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322873334:2(687-703)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TPDS.2022.3228733

Index Terms

A Proactive Data-Parallel Framework for Machine Learning

Index terms have been assigned to the content through auto-classification.

Recommendations

PARAFAC-Based Blind Identification of Underdetermined Mixtures Using Gaussian Mixture Model

This paper presents a novel algorithm, named GMM-PARAFAC, for blind identification of underdetermined instantaneous linear mixtures. The GMM-PARAFAC algorithm uses Gaussian mixture model (GMM) to model non-Gaussianity of the independent sources. We show ...
An Expectation-Maximization Algorithm for Blind Separation of Noisy Mixtures Using Gaussian Mixture Model

In this paper, we propose a new expectation-maximization (EM) algorithm, named GMM-EM, to blind separation of noisy instantaneous mixtures, in which the non-Gaussianity of independent sources is exploited by modeling their distribution using the ...
Blind separation of non-stationary sources using continuous density hidden Markov models

Blind source separation (BSS) has attained much attention in signal processing society due to its 'blind' property and wide applications. However, there are still some open problems, such as underdetermined BSS, noise BSS. In this paper, we propose a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

BDCAT '21: Proceedings of the 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies

December 2021

133 pages

ISBN:9781450391641

DOI:10.1145/3492324

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSF (National Science Foundation)

Conference

BDCAT '21

Sponsor:

SIGARCH

BDCAT '21: 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies

December 6 - 9, 2021

Leicester, United Kingdom

Acceptance Rates

Overall Acceptance Rate 27 of 93 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
232
Total Downloads

Downloads (Last 12 months)117
Downloads (Last 6 weeks)17

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tan YYang LLi WWu YSchiavoni VEdinger JCao JJin Z(2024)RoleML: a Role-Oriented Programming Model for Customizable Distributed Machine Learning on EdgesProceedings of the 25th International Middleware Conference10.1145/3652892.3700765(279-291)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3652892.3700765
Wang ZTu YWang NGao LNie JWei ZGu YYu G(2023)FSP: Towards Flexible Synchronous Parallel Frameworks for Distributed Machine LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322873334:2(687-703)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TPDS.2022.3228733

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten