research-article

Accelerating Distributed Machine Learning by Smart Parameter Server

Authors:

Shuai WangAuthors Info & Claims

APNet '19: Proceedings of the 3rd Asia-Pacific Workshop on Networking

Pages 92 - 98

https://doi.org/10.1145/3343180.3343192

Published: 17 August 2019 Publication History

Abstract

Parameter Server (PS)-based architecture is widely applied in distributed machine learning (DML), but it is still an open issue how to improve the DML performance in this frame-work. Existing works mainly focus on the view of workers. In this paper, we tackle this problem from another perspective, by leveraging the central control on the PS. Specifically, we propose SmartPS, which transforms the passive role of PS in traditional DML and fully exploits the intelligence of PS. Firstly, the PS holds the global view of parameter dependency, facilitating it to update workers' parameters selectively and proactively. Secondly, the PS records the workers' speeds, and prioritizes parameter transmission to narrow the gap between stragglers and fast workers. Thirdly, the PS considers the parameter dependency in consecutive training iterations, and opportunistically blocks unnecessary pushes from workers. We conduct comparative experiments with two typical benchmarks, Matrix Factorization (MF) and PageRank (PR). The experimental results prove that, compared with all the baseline algorithms (i.e. standard BSP, ASP and SSP), SmartPS can reduce the overall training time by 65.7%~84.9%, with the same training accuracy.

References

[1]

P. Watcharapichat, V. L. Morales et al., "Ako: Decentralised deep learning with partial gradient exchange," in Proceedings of SoCC '16.

Digital Library

[2]

H. Zhang, Z. Zheng et al., "Poseidon: An efficient communication architecture for distributed deep learning on gpu clusters," in Proceedings of USENIX ATC '17.

Digital Library

[3]

M. Li, D. Andersen, J. W. Park et al., "Scaling distributed machine learning with the parameter server," in Proceedings of OSDI'14, 2014.

Digital Library

[4]

J. Geng, D. Li, Y. Cheng, S. Wang, and J. Li, "HiPS: Hierarchical parameter synchronization in large-scale distributed machine learning," in Proceedings of ACM SIGCOMM Workshop on NetAI'18. New York, NY, USA: ACM, 2018.

Digital Library

[5]

Y. Cheng, D. Li, Z. Guo, B. Jiang, J. Lin, X. Fan, GengJinkun, X. Yu, W. Bai, L. Qu, R. Shu, P. Cheng, Y. Xiong, and J. Wu, "Dlbooster: Boosting end-to-end deep learning workflows with offloading data preprocessing pipelines," in Proceedings of the 48th International Conference on Parallel Processing, ser. ICPP'19, 2019.

Digital Library

[6]

J. Geng, D. Li, and S. Wang, "Elasticpipe: An efficient and dynamic model-parallel solution to dnn training," in Proceedings of 10th workshop on Scientific Cloud Computing, ser. ScienceCloud '19. New York, NY, USA: ACM, 2019.

Digital Library

[7]

S. Wang, D. Li, J. Geng, Y. Gu, and Y. Cheng, "Impact of network topology on the performance of dml: Theoretical analysis and practical factors," in IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, ser. INFOCOM'19, 2019.

[8]

J. Geng, D. Li, and S. Wang, "Horizontal or vertical? a hybrid approach to large-scale distributed machine learning," in Proceedings of 1st Workshop on Converged Computing Infrastructure, ser. CCIW '19. New York, NY, USA: ACM, 2019.

Digital Library

[9]

Y. Cheng, J. Geng, Y. Wang, J. Li, D. Li, and J. Wu, "Bridging machine learning and computer network research: a survey," CCF Transactions on Networking, Nov 2018. {Online}. Available:

[10]

M. Abadi, P. Barham, J. Chen et al., "Tensorflow: A system for large-scale machine learning," in Proceedings of OSDI'16.

Digital Library

[11]

C. Tianqi, L. Mu et al., "Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems," arXiv:1512.01274.

[12]

H. Cui, J. Cipar, Q. Ho et al., "Exploiting bounded staleness to speed up big data analytics," in Proceedings of ATC'14.

Digital Library

[13]

W. Dai, A. Kumar et al., "High-performance distributed ml at scale through parameter server consistency models," in Proceedings of AAAI'15.

Digital Library

[14]

C. Jianmin, P. Xinghao, M. Rajat, and et al., "Revisiting distributed synchronous sgd," arXiv:1604.00981, 2016.

[15]

J. Geng, D. Li, and S. Wang, "Rima:an rdma-accelerated model-parallelized solution to large-scale matrix factorization," in Proceedings of 35th IEEE International Conference on Data Engineering, ser. ICDE'19. IEEE, 2019.

[16]

R. Gemulla, E. Nijkamp et al., "Large-scale matrix factorization with distributed stochastic gradient descent," in Proceedings of KDD '11.

Digital Library

[17]

H. Cui, A. Tumanov, J. Wei, and others., "Exploiting iterative-ness for parallel ml computations," in Proceedings of SoCC'14.

Digital Library

[18]

J. Dinan, D. B. Larkins et al., "Scalable work stealing," in Proceedings of SC'09.

Digital Library

[19]

A. Harlap, H. Cui et al., "Addressing the straggler problem for iterative convergent parallel ml," in Proceedings of SoCC'16.

Digital Library

[20]

J. Geng, D. Li, and S. Wang, "Accelerating distributed machine learning by smart parameter server," Department of Computer Science and Technology, Tsinghua University, Tech. Rep., 2019. {Online}. Available: https://cloud.tsinghua.edu.cn/f/b5e741598a5f46da8515/?dl=1

Cited By

Zhang RJiang HGeng JTian FMa YWang H(2024)A high-performance dataflow-centric optimization framework for deep learning inference on the edgeJournal of Systems Architecture10.1016/j.sysarc.2024.103180152(103180)Online publication date: Jul-2024
https://doi.org/10.1016/j.sysarc.2024.103180
Tairin SShen HZhang Z(2023)Embracing Uncertainty for Equity in Resource Allocation in ML TrainingProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605583(423-432)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605583
Parizotto RCoelho BNunes DHaque ISchaeffer-Filho A(2023)Offloading Machine Learning to Programmable Data Planes: A Systematic SurveyACM Computing Surveys10.1145/360515356:1(1-34)Online publication date: 26-Aug-2023
https://dl.acm.org/doi/10.1145/3605153
Show More Cited By

Index Terms

Accelerating Distributed Machine Learning by Smart Parameter Server
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Networks
  1. Network architectures

Recommendations

Elastic parameter server load distribution in deep learning clusters
SoCC '20: Proceedings of the 11th ACM Symposium on Cloud Computing

In distributed DNN training, parameter servers (PS) can become performance bottlenecks due to PS stragglers, caused by imbalanced parameter distribution, bandwidth contention, or computation interference. Few existing studies have investigated efficient ...
Scanning of real-world web applications for parameter tampering vulnerabilities
ASIA CCS '14: Proceedings of the 9th ACM symposium on Information, computer and communications security

Web applications require exchanging parameters between a client and a server to function properly. In real-world systems such as online banking transfer, traversing multiple pages with parameters contributed by both the user and server is a must, and ...
Automatic tuning of MPI runtime parameter settings by using machine learning
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

MPI implementations provide several hundred runtime parameters that can be tuned for performance improvement. The ideal parameter setting does not only depend on the target multiprocessor architecture but also on the application, its problem and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

APNet '19: Proceedings of the 3rd Asia-Pacific Workshop on Networking

August 2019

104 pages

ISBN:9781450376358

DOI:10.1145/3343180

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

he National Key Research and Development Program of China
the National Natural Science Foundation of China
the Research and Development Program in Key Areas of Guangdong Province

Conference

APNet '19

APNet '19: 3rd Asia-Pacific Workshop on Networking 2019

August 17 - 18, 2019

Beijing, China

Acceptance Rates

Overall Acceptance Rate 50 of 118 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
372
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang RJiang HGeng JTian FMa YWang H(2024)A high-performance dataflow-centric optimization framework for deep learning inference on the edgeJournal of Systems Architecture10.1016/j.sysarc.2024.103180152(103180)Online publication date: Jul-2024
https://doi.org/10.1016/j.sysarc.2024.103180
Tairin SShen HZhang Z(2023)Embracing Uncertainty for Equity in Resource Allocation in ML TrainingProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605583(423-432)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605583
Parizotto RCoelho BNunes DHaque ISchaeffer-Filho A(2023)Offloading Machine Learning to Programmable Data Planes: A Systematic SurveyACM Computing Surveys10.1145/360515356:1(1-34)Online publication date: 26-Aug-2023
https://dl.acm.org/doi/10.1145/3605153
Li LChen SLei Z(2023)Tree-Based Elastic Parameter Server to Schedule Resources to Accelerate Distributed 'Training2023 IEEE 11th Joint International Information Technology and Artificial Intelligence Conference (ITAIC)10.1109/ITAIC58329.2023.10408975(379-382)Online publication date: 8-Dec-2023
https://doi.org/10.1109/ITAIC58329.2023.10408975
Sun HGui ZGuo SQi QWang JLiao J(2022)GSSP: Eliminating Stragglers Through Grouping Synchronous for Distributed Deep Learning in Heterogeneous ClusterIEEE Transactions on Cloud Computing10.1109/TCC.2021.306239810:4(2637-2648)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TCC.2021.3062398
Wu JLiu ZYang B(2021)DQ-DPS Data Partition Strategy Based on Distributed Machine LearningProceedings of the 2021 2nd International Conference on Artificial Intelligence in Electronics Engineering10.1145/3460268.3460272(20-26)Online publication date: 15-Jan-2021
https://dl.acm.org/doi/10.1145/3460268.3460272
Xian LLi BLiu JGuo ZDu D(2021)H-PS: A Heterogeneous-Aware Parameter Server With Distributed Neural Network TrainingIEEE Access10.1109/ACCESS.2021.30601549(44049-44058)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3060154
Chen YPeng YBao YWu CZhu YGuo CFonseca RDelimitrou COoi B(2020)Elastic parameter server load distribution in deep learning clustersProceedings of the 11th ACM Symposium on Cloud Computing10.1145/3419111.3421307(507-521)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3419111.3421307
Wen DBennis MHuang K(2020)Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of Partitioned Edge LearningIEEE Transactions on Wireless Communications10.1109/TWC.2020.302117719:12(8272-8286)Online publication date: 1-Dec-2020
https://dl.acm.org/doi/10.1109/TWC.2020.3021177
Xu HLiu YLau WZeng TGuo JLiu A(2020)Online Resource Allocation With Machine Variability: A Bandit PerspectiveIEEE/ACM Transactions on Networking10.1109/TNET.2020.300690628:5(2243-2256)Online publication date: Oct-2020
https://doi.org/10.1109/TNET.2020.3006906
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten