Skip to main content

Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models

  • Conference paper
  • First Online:
Euro-Par 2024: Parallel Processing (Euro-Par 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14802))

Included in the following conference series:

  • 963 Accesses

Abstract

Hybrid parallelism is popular in training large language models (LLMs). However, existing efforts have focused on optimizing individual strategies in hybrid parallelism, such as pipeline scheduling, device assignment, etc., which limits the overall training efficiency. This paper explores the intricate dependencies among four pivotal strategies-model scaling, model splitting, pipeline scheduling, and device assignment-and proposes Quartet, a holistic hybrid parallel framework for joint optimization. The novelty lies upon the formulation of parameterized pipeline scheduling and device assignment, alongside a pioneering analysis of model scaling’s impact on the throughput. These provide the basis for orchestrating four strategies within a unified framework to maximize the overall training throughput efficiently. Evaluation results show that: for representative LLMs , Quartet improves the training throughput by up to 2.16\(\times \) over the state-of-the-art synchronous hybrid parallel approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Athlur, S., Saran, N., Sivathanu, M., Ramjee, R., Kwatra, N.: Varuna: scalable, low-cost training of massive deep learning models. In: Proceedings of the Seventeenth European Conference on Computer Systems, pp. 472–487 (2022)

    Google Scholar 

  2. Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, 1877–1901 (2020)

    Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  4. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  5. Eliad, S., Hakimi, I., De Jagger, A., Silberstein, M., Schuster, A.: Fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism. In: 2021 USENIX Annual Technical Conference, pp. 381–396 (2021)

    Google Scholar 

  6. Fan, S., et al.: Dapple: a pipelined data parallel approach for training large models. In: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 431–445 (2021)

    Google Scholar 

  7. Feng, Y., Xie, M., Tian, Z., Wang, S., Lu, Y., Shu, J.: Mobius: fine tuning large-scale models on commodity GPU servers. In: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 2, pp. 489–501 (2023)

    Google Scholar 

  8. Guo, J., et al.: Accudnn: a GPU memory efficient accelerator for training ultra-deep neural networks. In: 2019 IEEE 37th International Conference on Computer Design, pp. 65–72. IEEE (2019)

    Google Scholar 

  9. Huang, Y., et al.: Gpipe: efficient training of giant neural networks using pipeline parallelism. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  10. Lee, S., Kim, J.K., Zheng, X., Ho, Q., Gibson, G.A., Xing, E.P.: On model parallelization and scheduling strategies for distributed machine learning. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  11. Li, D., Wang, H., Xing, E., Zhang, H.: AMP: automatically finding model parallel strategies with heterogeneity awareness. In: Advances in Neural Information Processing Systems, vol. 35, pp. 6630–6639 (2022)

    Google Scholar 

  12. Li, M., Andersen, D.G., Smola, A.J., Yu, K.: Communication efficient distributed machine learning with the parameter server. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  13. Li, S., Hoefler, T.: Chimera: efficiently training large-scale neural networks with bidirectional pipelines. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14 (2021)

    Google Scholar 

  14. Li, Z., et al.: Train big, then compress: rethinking model size for efficient training and inference of transformers. In: International Conference on Machine Learning, pp. 5958–5968. PMLR (2020)

    Google Scholar 

  15. Narayanan, D., et al.: Pipedream: generalized pipeline parallelism for DNN training. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 1–15 (2019)

    Google Scholar 

  16. Narayanan, D., Phanishayee, A., Shi, K., Chen, X., Zaharia, M.: Memory-efficient pipeline-parallel DNN training. In: International Conference on Machine Learning, pp. 7937–7947. PMLR (2021)

    Google Scholar 

  17. Narayanan, D., et al.: Efficient large-scale language model training on GPU clusters using Megatron-LM. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15 (2021)

    Google Scholar 

  18. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  19. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  20. Yang, P., Zhang, X., Zhang, W., Yang, M., Wei, H.: Group-based interleaved pipeline parallelism for large-scale DNN training. In: International Conference on Learning Representations (2021)

    Google Scholar 

  21. Zhang, W., Zhou, B., Tang, X., Wang, Z., Hu, S.: Mixpipe: efficient bidirectional pipeline parallelism for training large-scale models. In: 2023 60th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biyu Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, W. et al. (2024). Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14802. Springer, Cham. https://doi.org/10.1007/978-3-031-69766-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69766-1_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69765-4

  • Online ISBN: 978-3-031-69766-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics