short-paper

ElasticPipe: An Efficient and Dynamic Model-Parallel Solution to DNN Training

Authors:

Jinkun Geng,

Dan Li,

Shuai WangAuthors Info & Claims

ScienceCloud '19: Proceedings of the 10th Workshop on Scientific Cloud Computing

Pages 5 - 9

https://doi.org/10.1145/3322795.3331463

Published: 17 June 2019 Publication History

Get Access

Abstract

Traditional deep neural network (DNN) training is executed with data parallelism, which suffers from significant communication overheads and GPU memory consumption. Considering this, recent pioneering works have attempted to train DNN with model parallelism. However, model partition remains as a major concern and a static partition fails to adapt to the ever-changing computation environment of the cloud cluster. This paper proposes ElasticPipe, which trains the neural network based on pipe-based model parallelism. Unlike data-parallel solutions, each node in ElasticPipe only holds part of the whole model, leading to much lower cost of communication and GPU memory. More importantly, ElasticPipe is able to dynamically tune the workload distribution among different nodes, so that it can mitigate the common straggler effect in cloud environment. Our primary experiment shows, compared to the data-parallel baselines, ElasticPipe can reduce the training time by up to 89.03% without considering straggler effect, and by up to 76.72% with the existence of stragglers. Besides, ElasticPipe also outperforms its static counterpart by up to 28.81% in training performance when stragglers are involved.

References

[1]

Harlap Aaron, Narayanan Deepak, Amar Phanishayee, and et al. 2018. PipeDream: Pipeline Parallelism for DNN Training. In Proceedings of SysML'18 .

Google Scholar

[2]

Krizhevsky Alex. 2014. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997 (2014).

Google Scholar

[3]

Chi-Chung Chen, Chia-Lin Yang, and Hsiang-Yun Cheng. 2018. Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform. arXiv:1809.02839 (2018).

Google Scholar

[4]

Henggang Cui, James Cipar, et al. 2014. Exploiting Bounded Staleness to Speed Up Big Data Analytics. In Proceedings of ATC'14 .

Digital Library

Google Scholar

[5]

Wei Dai, Abhimanu Kumar, Jinliang Wei, et al. 2015. High-performance Distributed ML at Scale Through Parameter Server Consistency Models. In Proceedings of AAAI'15 .

Digital Library

Google Scholar

[6]

Jinkun Geng, Dan Li, Yang Cheng, et al. 2018. HiPS: Hierarchical Parameter Synchronization in Large-Scale Distributed Machine Learning. In Proceedings of NetAI'18 .

Digital Library

Google Scholar

[7]

Aaron Harlap, Henggang Cui, Wei Dai, et al. 2016. Addressing the Straggler Problem for Iterative Convergent Parallel ML. In Proceedings of the SoCC'16 .

Digital Library

Google Scholar

[8]

Yanping Huang, Yonglong Cheng, Dehao Chen, et al. 2018. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv preprint arXiv:1811.06965 (2018).

Google Scholar

[9]

Mu Li, David Andersen, Jun Woo Park, et al. 2014. Scaling Distributed Machine Learning with the Parameter Server. In Proceedings of OSDI'14 .

Digital Library

Google Scholar

[10]

Y. Li, J. Park, M. Alian, et al. 2018. A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks. In Proceedings of MICRO'18 .

Digital Library

Google Scholar

[11]

Molchanov Pavlo, Tyree Stephen, Karras Tero, et al. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv:1611.06440 (2017).

Google Scholar

Cited By

View all

Cheng LGu YLiu QYang LLiu CWang Y(2024)Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A SurveyIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33531769:6(830-847)Online publication date: Nov-2024
https://doi.org/10.1109/TSUSC.2024.3353176
Hanindhito BPatel BJohn L(2024)Bandwidth Characterization of DeepSpeed on Distributed Large Language Model Training2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00031(241-256)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00031
Kasarapu SShukla SDinakarrao SKim T(2024)Resource- and Workload-Aware Malware Detection through Distributed Computing in IoT NetworksProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473814(368-373)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1109/ASP-DAC58780.2024.10473814
Show More Cited By

Index Terms

ElasticPipe: An Efficient and Dynamic Model-Parallel Solution to DNN Training
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Networks
  1. Network architectures

Recommendations

Parallelizing DNN Training on GPUs: Challenges and Opportunities
WWW '21: Companion Proceedings of the Web Conference 2021

In recent years, Deep Neural Networks (DNNs) have emerged as a widely adopted approach in many application domains. Training DNN models is also becoming a significant fraction of the datacenter workload. Recent evidence has demonstrated that modern ...
State-clustering based multiple deep neural networks modeling approach for speech recognition

The hybrid deep neural network (DNN) and hidden Markov model (HMM) has recently achieved dramatic performance gains in automatic speech recognition (ASR). The DNN-based acoustic model is very powerful but its learning process is extremely time-consuming. ...
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow
High Performance Computing
Abstract
To reduce the training time of large-scale Deep Neural Networks (DNNs), Deep Learning (DL) scientists have started to explore parallelization strategies like data-parallelism, model-parallelism, and hybrid-parallelism. While data-parallelism has ...

Comments

Information & Contributors

Information

Published In

ScienceCloud '19: Proceedings of the 10th Workshop on Scientific Cloud Computing

June 2019

32 pages

ISBN:9781450367585

DOI:10.1145/3322795

General Chairs:
Bogdan Nicolae
Argonne National Laboratory, USA
,
Alexandru Costan
INSA Rennes, France
,
Dmitry Duplyakin
University of Utah, USA
,
Program Chairs:
Ana Gainaru
Vanderbilt University, USA
,
Alexandru Uta
Vrije Universiteit Amsterdam, Netherlands
,
Carlos Costa
IBM Research, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

HPDC '19

Sponsor:

University of Arizona
SIGHPC
SIGARCH

HPDC '19: The 28th International Symposium on High-Performance Parallel and Distributed Computing

June 25, 2019

AZ, Phoenix, USA

Acceptance Rates

ScienceCloud '19 Paper Acceptance Rate 22 of 106 submissions, 21%;

Overall Acceptance Rate 44 of 151 submissions, 29%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
522
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Cheng LGu YLiu QYang LLiu CWang Y(2024)Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A SurveyIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33531769:6(830-847)Online publication date: Nov-2024
https://doi.org/10.1109/TSUSC.2024.3353176
Hanindhito BPatel BJohn L(2024)Bandwidth Characterization of DeepSpeed on Distributed Large Language Model Training2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00031(241-256)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00031
Kasarapu SShukla SDinakarrao SKim T(2024)Resource- and Workload-Aware Malware Detection through Distributed Computing in IoT NetworksProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473814(368-373)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1109/ASP-DAC58780.2024.10473814
Zhang RJiang HGeng JTian FMa YWang H(2024)A high-performance dataflow-centric optimization framework for deep learning inference on the edgeJournal of Systems Architecture10.1016/j.sysarc.2024.103180152(103180)Online publication date: Jul-2024
https://doi.org/10.1016/j.sysarc.2024.103180
Parizotto RCoelho BNunes DHaque ISchaeffer-Filho A(2023)Offloading Machine Learning to Programmable Data Planes: A Systematic SurveyACM Computing Surveys10.1145/360515356:1(1-34)Online publication date: 26-Aug-2023
https://dl.acm.org/doi/10.1145/3605153
Kasarapu SShukla SPudukotai Dinakarrao S(2023)Resource- and Workload-Aware Model Parallelism-Inspired Novel Malware Detection for IoT DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.329012842:12(4618-4628)Online publication date: Dec-2023
https://doi.org/10.1109/TCAD.2023.3290128
Niknami NSawwan AWu J(2023)SmartPipe: Intelligently Freezing Layers in Pipeline Parallelism for Distributed DNN Training2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00259(1885-1894)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00259
Joshi PHasanuzzaman MThapa CAfli HScully T(2023)Enabling All In-Edge Deep Learning: A Literature ReviewIEEE Access10.1109/ACCESS.2023.323476111(3431-3460)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3234761
Akintoye SHan LLloyd HZhang XDancey DChen HZhang D(2023)Layer-wise partitioning and merging for efficient and scalable deep learningFuture Generation Computer Systems10.1016/j.future.2023.07.043149(432-444)Online publication date: Dec-2023
https://doi.org/10.1016/j.future.2023.07.043
Zhang RJiang HTian FGeng JLi XMa YZhu CDong DLi XWang H(2023)Xenos : Dataflow-Centric Optimization to Accelerate Model Inference on Edge DevicesDatabase Systems for Advanced Applications10.1007/978-3-031-30637-2_35(535-545)Online publication date: 14-Apr-2023
https://doi.org/10.1007/978-3-031-30637-2_35
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Parallelizing DNN Training on GPUs: Challenges and Opportunities

State-clustering based multiple deep neural networks modeling approach for speech recognition

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations