Accelerating Synchronous Distributed Data Parallel Training with Small Batch Sizes

Sun, Yushu; Bi, Nifei; Xu, Chen; Niu, Yuean; Zhou, Hongfu

doi:10.1007/978-981-97-5569-1_33

Yushu Sun^15,16,
Nifei Bi^15,16,
Chen Xu^15,16,
Yuean Niu^15,16 &
…
Hongfu Zhou¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14854))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

281 Accesses

Abstract

Synchronous distributed data parallel (SDDP) training is widely employed in distributed deep learning systems to train DNN models on large datasets. The performance of SDDP training essentially depends on the communication overhead and the statistical efficiency. However, existing approaches only optimize either the communication overhead or the statistical efficiency to accelerate SDDP training. In this paper, we adopt the advantages of those approaches and design a new approach, namely SkipSMA, that benefits from both low communication overhead and high statistical efficiency. In particular, we exploit the skipping strategy with an adaptive interval to decrease the communication frequency, which guarantees low communication overhead. Moreover, we employ the correction technique to mitigate the divergence while keeping small batch sizes, which ensures high statistical efficiency. To demonstrate the performance of SkipSMA, we integrate it into TensorFlow. Our experiments show that SkipSMA outperforms the state-of-the-art solutions for SDDP training, e.g., 6.88x speedup over SSGD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimal distributed parallel algorithms for deep learning framework Tensorflow

Article 14 July 2021

Comparative Study of Distributed Deep Learning Tools on Supercomputers

More Effective Distributed Deep Learning Using Staleness Based Parameter Updating

References

Damania, et al.: Pytorch rpc: Distributed deep learning built on tensor-optimized remote procedure calls. MLSys (2023)
Google Scholar
Koliousis, A., et al.: Crossbow: Scaling deep learning with small batch sizes on multi-gpu servers. Proc. VLDB Endow. 12(11), 1399–1413 (2019)
Article Google Scholar
Li, S., et al.: Pytorch distributed: Experiences on accelerating data parallel training. Proc. VLDB Endow. 13(12), 3005–3018 (2020)
Article Google Scholar
Moritz, P., et al.: Sparknet: Training deep networks in spark. In: ICLR (2016)
Google Scholar
Shirkoohi, M.K., et al.: Sip-ml: high-bandwidth optical network interconnects for machine learning training. In: SIGCOMM. pp. 657–675 (2021)
Google Scholar
Yu, H., et al.: Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning. In: AAAI. pp. 5693–5700 (2019)
Google Scholar
Zhang, C., et al.: Dimmwitted: A study of main-memory statistical analytics. Proc. VLDB Endow. 7(12), 1283–1294 (2014)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62272168), and Natural Science Foundation of Shanghai (No. 23ZR1419900).

Author information

Authors and Affiliations

East China Normal University, Shanghai, China
Yushu Sun, Nifei Bi, Chen Xu & Yuean Niu
Shanghai Engineering Research Center of Big Data Management, Shanghai, China
Yushu Sun, Nifei Bi, Chen Xu & Yuean Niu
Shanghai Institute of Process Automation and Instrumentation, Shanghai, China
Hongfu Zhou

Authors

Yushu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Nifei Bi
View author publications
You can also search for this author in PubMed Google Scholar
Chen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuean Niu
View author publications
You can also search for this author in PubMed Google Scholar
Hongfu Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Xu .

Editor information

Editors and Affiliations

Osaka University, Suita, Osaka, Japan
Makoto Onizuka
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
Beihang University, Beijing, China
Yongxin Tong
Osaka University, Osaka, Japan
Chuan Xiao
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
University of Grenoble Alpes, Saint-Martin d’Hères, France
Sihem Amer-Yahia
University of Michigan, Ann Arbor, MI, USA
H. V. Jagadish
Nagoya University, Nagoya, Japan
Kejing Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, Y., Bi, N., Xu, C., Niu, Y., Zhou, H. (2024). Accelerating Synchronous Distributed Data Parallel Training with Small Batch Sizes. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14854. Springer, Singapore. https://doi.org/10.1007/978-981-97-5569-1_33

Download citation

DOI: https://doi.org/10.1007/978-981-97-5569-1_33
Published: 13 December 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5568-4
Online ISBN: 978-981-97-5569-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics