Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models

Zhang, Weigang; Zhou, Biyu; Wu, Xing; Gao, Chaochen; Liu, Zhibing; Tang, Xuehai; Li, Ruixuan; Han, Jizhong; Hu, Songlin

doi:10.1007/978-3-031-69766-1_29

Weigang Zhang^13,14,
Biyu Zhou^13,14,
Xing Wu^13,14,
Chaochen Gao^13,14,
Zhibing Liu^13,14,
Xuehai Tang^13,14,
Ruixuan Li^13,14,
Jizhong Han^13,14 &
…
Songlin Hu^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14802))

Included in the following conference series:

European Conference on Parallel Processing

963 Accesses

Abstract

Hybrid parallelism is popular in training large language models (LLMs). However, existing efforts have focused on optimizing individual strategies in hybrid parallelism, such as pipeline scheduling, device assignment, etc., which limits the overall training efficiency. This paper explores the intricate dependencies among four pivotal strategies-model scaling, model splitting, pipeline scheduling, and device assignment-and proposes Quartet, a holistic hybrid parallel framework for joint optimization. The novelty lies upon the formulation of parameterized pipeline scheduling and device assignment, alongside a pioneering analysis of model scaling’s impact on the throughput. These provide the basis for orchestrating four strategies within a unified framework to maximize the overall training throughput efficiently. Evaluation results show that: for representative LLMs , Quartet improves the training throughput by up to 2.16$\times $ over the state-of-the-art synchronous hybrid parallel approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reparameterization-Based Parameter-Efficient Fine-Tuning Methods for Large Language Models: A Systematic Survey

MOROCCO: Model Resource Comparison Framework

Evaluation of pre-training large language models on leadership-class supercomputers

Article 16 June 2023

References

Athlur, S., Saran, N., Sivathanu, M., Ramjee, R., Kwatra, N.: Varuna: scalable, low-cost training of massive deep learning models. In: Proceedings of the Seventeenth European Conference on Computer Systems, pp. 472–487 (2022)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, 1877–1901 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Eliad, S., Hakimi, I., De Jagger, A., Silberstein, M., Schuster, A.: Fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism. In: 2021 USENIX Annual Technical Conference, pp. 381–396 (2021)
Google Scholar
Fan, S., et al.: Dapple: a pipelined data parallel approach for training large models. In: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 431–445 (2021)
Google Scholar
Feng, Y., Xie, M., Tian, Z., Wang, S., Lu, Y., Shu, J.: Mobius: fine tuning large-scale models on commodity GPU servers. In: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 2, pp. 489–501 (2023)
Google Scholar
Guo, J., et al.: Accudnn: a GPU memory efficient accelerator for training ultra-deep neural networks. In: 2019 IEEE 37th International Conference on Computer Design, pp. 65–72. IEEE (2019)
Google Scholar
Huang, Y., et al.: Gpipe: efficient training of giant neural networks using pipeline parallelism. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Lee, S., Kim, J.K., Zheng, X., Ho, Q., Gibson, G.A., Xing, E.P.: On model parallelization and scheduling strategies for distributed machine learning. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Li, D., Wang, H., Xing, E., Zhang, H.: AMP: automatically finding model parallel strategies with heterogeneity awareness. In: Advances in Neural Information Processing Systems, vol. 35, pp. 6630–6639 (2022)
Google Scholar
Li, M., Andersen, D.G., Smola, A.J., Yu, K.: Communication efficient distributed machine learning with the parameter server. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Li, S., Hoefler, T.: Chimera: efficiently training large-scale neural networks with bidirectional pipelines. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14 (2021)
Google Scholar
Li, Z., et al.: Train big, then compress: rethinking model size for efficient training and inference of transformers. In: International Conference on Machine Learning, pp. 5958–5968. PMLR (2020)
Google Scholar
Narayanan, D., et al.: Pipedream: generalized pipeline parallelism for DNN training. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 1–15 (2019)
Google Scholar
Narayanan, D., Phanishayee, A., Shi, K., Chen, X., Zaharia, M.: Memory-efficient pipeline-parallel DNN training. In: International Conference on Machine Learning, pp. 7937–7947. PMLR (2021)
Google Scholar
Narayanan, D., et al.: Efficient large-scale language model training on GPU clusters using Megatron-LM. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15 (2021)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Yang, P., Zhang, X., Zhang, W., Yang, M., Wei, H.: Group-based interleaved pipeline parallelism for large-scale DNN training. In: International Conference on Learning Representations (2021)
Google Scholar
Zhang, W., Zhou, B., Tang, X., Wang, Z., Hu, S.: Mixpipe: efficient bidirectional pipeline parallelism for training large-scale models. In: 2023 60th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Weigang Zhang, Biyu Zhou, Xing Wu, Chaochen Gao, Zhibing Liu, Xuehai Tang, Ruixuan Li, Jizhong Han & Songlin Hu
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Weigang Zhang, Biyu Zhou, Xing Wu, Chaochen Gao, Zhibing Liu, Xuehai Tang, Ruixuan Li, Jizhong Han & Songlin Hu

Authors

Weigang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Biyu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chaochen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhibing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xuehai Tang
View author publications
You can also search for this author in PubMed Google Scholar
Ruixuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jizhong Han
View author publications
You can also search for this author in PubMed Google Scholar
Songlin Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Biyu Zhou .

Editor information

Editors and Affiliations

University Carlos III of Madrid, Madrid, Spain
Jesus Carretero
University of Oregon, Eugene, OR, USA
Sameer Shende
University Carlos III of Madrid, Madrid, Spain
Javier Garcia-Blas
TU Wien, Vienna, Austria
Ivona Brandic
Universidad Complutense de Madrid, Madrid, Spain
Katzalin Olcoz
Université Grenoble Alpes, Saint Martin d'Hères, France
Martin Schreiber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, W. et al. (2024). Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14802. Springer, Cham. https://doi.org/10.1007/978-3-031-69766-1_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-69766-1_29
Published: 26 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-69765-4
Online ISBN: 978-3-031-69766-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models