skip to main content
10.1145/3322795.3331461acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper

Horizontal or Vertical?: A Hybrid Approach to Large-Scale Distributed Machine Learning

Published: 17 June 2019 Publication History

Abstract

Data parallelism and model parallelism are two typical parallel modes for distributed machine learning (DML). Traditionally, DML mainly leverages data parallelism, which maintains one model instance for each node and synchronizes the model parameters at the end of every iteration. However, as the model grows larger, communication cost and GPU memory consumption become significant. Data parallelism thus fails to work efficiently in large scale, and model-parallel solutions are proposed in recent years. In this paper, we comprehensively discuss the benefits and drawbacks on both sides. Based on the comparative analysis, we propose Hove, a hybrid approach incorporating data parallelism and model parallelism to balance the overheads and achieve high performance for large-scale DML.

References

[1]
Harlap Aaron, Narayanan Deepak, Amar Phanishayee, and et al. 2018. PipeDream: Pipeline Parallelism for DNN Training. In Proceedings of SysML'18 .
[2]
Umut A. Acar, Arthur Chargueraud, and Mike Rainey. 2013. Scheduling Parallel Programs by Work Stealing with Private Deques. In Proceedings of PPoPP'13 .
[3]
Chi-Chung Chen, Chia-Lin Yang, and Hsiang-Yun Cheng. 2018. Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform. arXiv:1809.02839 (2018).
[4]
J. Dinan, D. B. Larkins, P. Sadayappan, et al. 2009. Scalable work stealing. In Proceedings of SC'09 .
[5]
Jinkun Geng, Dan Li, Yang Cheng, et al. 2018. HiPS: Hierarchical Parameter Synchronization in Large-Scale Distributed Machine Learning. In Proceedings of NetAI'18 .
[6]
Jinkun Geng, Dan Li, and Shuai Wang. 2019. Rima: An RDMA-Accelerated Model-Parallerized solution to Large-Scale Matrix Factorization. In Proceedings of ICDE'19 .
[7]
Aaron Harlap, Henggang Cui, Wei Dai, et al. 2016. Addressing the Straggler Problem for Iterative Convergent Parallel ML. In Proceedings of the SoCC'16 .
[8]
Yanping Huang, Yonglong Cheng, Dehao Chen, et al. 2018. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv preprint arXiv:1811.06965 (2018).
[9]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436.
[10]
Liang Luo, Jacob Nelson, Luis Ceze, et al. 2018. Parameter Hub: A Rack-Scale Parameter Server for Distributed Deep Neural Network Training. In Proceedings of SoCC'18 .
[11]
Patarasuk Pitch and Yuan Xin. 2009. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel and Distrib. Comput., Vol. 69, 2 (2009), 117 -- 124.

Cited By

View all
  • (2024)A high-performance dataflow-centric optimization framework for deep learning inference on the edgeJournal of Systems Architecture10.1016/j.sysarc.2024.103180152(103180)Online publication date: Jul-2024
  • (2024)A comprehensive survey and taxonomy on privacy-preserving deep learningNeurocomputing10.1016/j.neucom.2024.127345576:COnline publication date: 25-Jun-2024
  • (2022)FuncPipe: A Pipelined Serverless Framework for Fast and Cost-Efficient Training of Deep Learning ModelsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706076:3(1-30)Online publication date: 8-Dec-2022
  • Show More Cited By

Index Terms

  1. Horizontal or Vertical?: A Hybrid Approach to Large-Scale Distributed Machine Learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ScienceCloud '19: Proceedings of the 10th Workshop on Scientific Cloud Computing
      June 2019
      32 pages
      ISBN:9781450367585
      DOI:10.1145/3322795
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 June 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPU utilization
      2. communication overhead
      3. data parallelism
      4. hybrid approach
      5. model parallelism

      Qualifiers

      • Short-paper

      Conference

      HPDC '19
      Sponsor:

      Acceptance Rates

      ScienceCloud '19 Paper Acceptance Rate 22 of 106 submissions, 21%;
      Overall Acceptance Rate 44 of 151 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)38
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A high-performance dataflow-centric optimization framework for deep learning inference on the edgeJournal of Systems Architecture10.1016/j.sysarc.2024.103180152(103180)Online publication date: Jul-2024
      • (2024)A comprehensive survey and taxonomy on privacy-preserving deep learningNeurocomputing10.1016/j.neucom.2024.127345576:COnline publication date: 25-Jun-2024
      • (2022)FuncPipe: A Pipelined Serverless Framework for Fast and Cost-Efficient Training of Deep Learning ModelsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706076:3(1-30)Online publication date: 8-Dec-2022
      • (2022)BaPipe: Balanced Pipeline Parallelism for DNN TrainingParallel Processing Letters10.1142/S012962642250005032:03n04Online publication date: 19-Aug-2022
      • (2021)DAPPLEProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441593(431-445)Online publication date: 17-Feb-2021
      • (2019)Accelerating Distributed Machine Learning by Smart Parameter ServerProceedings of the 3rd Asia-Pacific Workshop on Networking10.1145/3343180.3343192(92-98)Online publication date: 17-Aug-2019
      • (2019)DLBoosterProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337892(1-11)Online publication date: 5-Aug-2019
      • (2019)Grundzüge des maschinellen LernensBlockchain und maschinelles Lernen10.1007/978-3-662-60408-3_3(89-142)Online publication date: 28-Nov-2019

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media