short-paper

Horizontal or Vertical?: A Hybrid Approach to Large-Scale Distributed Machine Learning

Authors:

Jinkun Geng,

Dan Li,

Shuai WangAuthors Info & Claims

ScienceCloud '19: Proceedings of the 10th Workshop on Scientific Cloud Computing

Pages 1 - 4

https://doi.org/10.1145/3322795.3331461

Published: 17 June 2019 Publication History

Get Access

Abstract

Data parallelism and model parallelism are two typical parallel modes for distributed machine learning (DML). Traditionally, DML mainly leverages data parallelism, which maintains one model instance for each node and synchronizes the model parameters at the end of every iteration. However, as the model grows larger, communication cost and GPU memory consumption become significant. Data parallelism thus fails to work efficiently in large scale, and model-parallel solutions are proposed in recent years. In this paper, we comprehensively discuss the benefits and drawbacks on both sides. Based on the comparative analysis, we propose Hove, a hybrid approach incorporating data parallelism and model parallelism to balance the overheads and achieve high performance for large-scale DML.

References

[1]

Harlap Aaron, Narayanan Deepak, Amar Phanishayee, and et al. 2018. PipeDream: Pipeline Parallelism for DNN Training. In Proceedings of SysML'18 .

Google Scholar

[2]

Umut A. Acar, Arthur Chargueraud, and Mike Rainey. 2013. Scheduling Parallel Programs by Work Stealing with Private Deques. In Proceedings of PPoPP'13 .

Digital Library

Google Scholar

[3]

Chi-Chung Chen, Chia-Lin Yang, and Hsiang-Yun Cheng. 2018. Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform. arXiv:1809.02839 (2018).

Google Scholar

[4]

J. Dinan, D. B. Larkins, P. Sadayappan, et al. 2009. Scalable work stealing. In Proceedings of SC'09 .

Digital Library

Google Scholar

[5]

Jinkun Geng, Dan Li, Yang Cheng, et al. 2018. HiPS: Hierarchical Parameter Synchronization in Large-Scale Distributed Machine Learning. In Proceedings of NetAI'18 .

Digital Library

Google Scholar

[6]

Jinkun Geng, Dan Li, and Shuai Wang. 2019. Rima: An RDMA-Accelerated Model-Parallerized solution to Large-Scale Matrix Factorization. In Proceedings of ICDE'19 .

Crossref

Google Scholar

[7]

Aaron Harlap, Henggang Cui, Wei Dai, et al. 2016. Addressing the Straggler Problem for Iterative Convergent Parallel ML. In Proceedings of the SoCC'16 .

Digital Library

Google Scholar

[8]

Yanping Huang, Yonglong Cheng, Dehao Chen, et al. 2018. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv preprint arXiv:1811.06965 (2018).

Google Scholar

[9]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436.

Google Scholar

[10]

Liang Luo, Jacob Nelson, Luis Ceze, et al. 2018. Parameter Hub: A Rack-Scale Parameter Server for Distributed Deep Neural Network Training. In Proceedings of SoCC'18 .

Digital Library

Google Scholar

[11]

Patarasuk Pitch and Yuan Xin. 2009. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel and Distrib. Comput., Vol. 69, 2 (2009), 117 -- 124.

Digital Library

Google Scholar

Cited By

View all

Zhang RJiang HGeng JTian FMa YWang H(2024)A high-performance dataflow-centric optimization framework for deep learning inference on the edgeJournal of Systems Architecture10.1016/j.sysarc.2024.103180152(103180)Online publication date: Jul-2024
https://doi.org/10.1016/j.sysarc.2024.103180
Tran ALuong THuynh V(2024)A comprehensive survey and taxonomy on privacy-preserving deep learningNeurocomputing10.1016/j.neucom.2024.127345576:COnline publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127345
Liu YJiang BGuo THuang ZMa WWang XZhou C(2022)FuncPipe: A Pipelined Serverless Framework for Fast and Cost-Efficient Training of Deep Learning ModelsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706076:3(1-30)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570607
Show More Cited By

Index Terms

Horizontal or Vertical?: A Hybrid Approach to Large-Scale Distributed Machine Learning
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Networks
  1. Network architectures

Recommendations

Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing

This paper evaluates the IBM SP2 architecture, the AIX parallel programming environment, and the IBM message-passing library (MPL) through STAP (Space-Time Adaptive Processing) benchmark experiments. Only coarse-grain parallelism was exploited on the ...
Integrated Model, Batch, and Domain Parallelism in Training Neural Networks
SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures

We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an ...
Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds

The main aim of the paper is to study allocation of processors to, parallel programs executing on a multiprocessor system, and the resulting speedups. First, we consider a parallel program as a sequence of steps where each step consists of a set of ...

Comments

Information & Contributors

Information

Published In

ScienceCloud '19: Proceedings of the 10th Workshop on Scientific Cloud Computing

June 2019

32 pages

ISBN:9781450367585

DOI:10.1145/3322795

General Chairs:
Bogdan Nicolae
Argonne National Laboratory, USA
,
Alexandru Costan
INSA Rennes, France
,
Dmitry Duplyakin
University of Utah, USA
,
Program Chairs:
Ana Gainaru
Vanderbilt University, USA
,
Alexandru Uta
Vrije Universiteit Amsterdam, Netherlands
,
Carlos Costa
IBM Research, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

HPDC '19

Sponsor:

University of Arizona
SIGHPC
SIGARCH

HPDC '19: The 28th International Symposium on High-Performance Parallel and Distributed Computing

June 25, 2019

AZ, Phoenix, USA

Acceptance Rates

ScienceCloud '19 Paper Acceptance Rate 22 of 106 submissions, 21%;

Overall Acceptance Rate 44 of 151 submissions, 29%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
360
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhang RJiang HGeng JTian FMa YWang H(2024)A high-performance dataflow-centric optimization framework for deep learning inference on the edgeJournal of Systems Architecture10.1016/j.sysarc.2024.103180152(103180)Online publication date: Jul-2024
https://doi.org/10.1016/j.sysarc.2024.103180
Tran ALuong THuynh V(2024)A comprehensive survey and taxonomy on privacy-preserving deep learningNeurocomputing10.1016/j.neucom.2024.127345576:COnline publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127345
Liu YJiang BGuo THuang ZMa WWang XZhou C(2022)FuncPipe: A Pipelined Serverless Framework for Fast and Cost-Efficient Training of Deep Learning ModelsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706076:3(1-30)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570607
Zhao LXu RWang TTian TWang XWu WIeong CJin X(2022)BaPipe: Balanced Pipeline Parallelism for DNN TrainingParallel Processing Letters10.1142/S012962642250005032:03n04Online publication date: 19-Aug-2022
https://doi.org/10.1142/S0129626422500050
Fan SRong YMeng CCao ZWang SZheng ZWu CLong GYang JXia LDiao LLiu XLin WLee JPetrank E(2021)DAPPLEProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441593(431-445)Online publication date: 17-Feb-2021
https://dl.acm.org/doi/10.1145/3437801.3441593
Geng JLi DWang S(2019)Accelerating Distributed Machine Learning by Smart Parameter ServerProceedings of the 3rd Asia-Pacific Workshop on Networking10.1145/3343180.3343192(92-98)Online publication date: 17-Aug-2019
https://dl.acm.org/doi/10.1145/3343180.3343192
Cheng YLi DGuo ZJiang BLin JFan XGeng JYu XBai WQu LShu RCheng PXiong YWu J(2019)DLBoosterProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337892(1-11)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337892
Lanquillon C(2019)Grundzüge des maschinellen LernensBlockchain und maschinelles Lernen10.1007/978-3-662-60408-3_3(89-142)Online publication date: 28-Nov-2019
https://doi.org/10.1007/978-3-662-60408-3_3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing

Integrated Model, Batch, and Domain Parallelism in Training Neural Networks

Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations