poster

Dynamic scaling for low-precision learning

Authors:

Ruobing Han,

Min Si,

James Demmel,

Yang YouAuthors Info & Claims

PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 480 - 482

https://doi.org/10.1145/3437801.3441624

Published: 17 February 2021 Publication History

Get Access

Abstract

In recent years, distributed deep learning is becoming popular in industry and academia. Although researchers want to use distributed systems for training, it has been reported that the communication cost for synchronizing gradients can be a bottleneck. Using low-precision gradients is a promising technique for reducing the bandwidth requirement. In this work, we propose Auto Precision Scaling (APS), an algorithm that can improve the accuracy when we communicate gradients by low-precision floating-point values. APS can improve the accuracy for all precisions with a trivial communication cost. Our experimental results show that for both image classification and segmentation, applying APS can train the state-of-the-art models by 8-bit floating-point gradients with no or only a tiny accuracy loss (<0.05%). Furthermore, we can avoid any accuracy loss by designing a hybrid-precision technique. Finally, we propose a performance model to evaluate the proposed method. Our experimental results show that APS can get a significant speedup over the state-of-the-art method. To make it available to researchers and developers, we design and implement a high-performance system for customized precision Deep Learning(CPD), which can simulate the training process using an arbitrary low-precision customized floating-point format. We integrate CPD into PyTorch and make it open-source to the public¹.

References

[1]

Takuya Akiba, Shuji Suzuki, and Keisuke Fukuda. 2017. Extremely large minibatch SGD: training resnet-50 on imagenet in 15 minutes. arXiv preprint arXiv:1711.04325 (2017).

Google Scholar

[2]

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).

Google Scholar

[3]

Nicholas J Higham. 2002. Accuracy and stability of numerical algorithms. Vol. 80. Siam.

Google Scholar

[4]

Xianyan Jia, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, et al. 2018. Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. arXiv preprint arXiv:1807.11205 (2018).

Google Scholar

[5]

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. 2017. Mixed precision training. arXiv preprint arXiv:1710.03740 (2017).

Google Scholar

[6]

Peng Sun, Yonggang Wen, Ruobing Han, Wansen Feng, and Shengen Yan. 2019. GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training. IEEE Transactions on Big Data (2019).

Google Scholar

[7]

Chris Ying, Sameer Kumar, Dehao Chen, Tao Wang, and Youlong Cheng. 2018. Image classification at supercomputer scale. arXiv preprint arXiv:1811.06992 (2018).

Google Scholar

[8]

Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer. 2018. Imagenet training in minutes. In Proceedings of the 47th International Conference on Parallel Processing. ACM, 1.

Digital Library

Google Scholar

Cited By

View all

Shen ALai ZSun TLi SGe KLiu WLi D(2025)Efficient deep neural network training via decreasing precision with layer capacityFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40669-319:10Online publication date: 1-Oct-2025
https://dl.acm.org/doi/10.1007/s11704-024-40669-3
Shen ALai ZZhang L(2024)Systematic Analysis of Low-Precision Training in Deep Neural Networks: Factors Influencing Matrix ComputationsApplied Sciences10.3390/app14211002514:21(10025)Online publication date: 2-Nov-2024
https://doi.org/10.3390/app142110025
Chen DZhu YWang DWang HXie JZhang XHan Z(2023)Love of Variety Based Latency Analysis for High Definition Map Updating: Age of Information and Distributional Robust PerspectivesIEEE Transactions on Intelligent Vehicles10.1109/TIV.2022.32246558:2(1751-1764)Online publication date: Feb-2023
https://doi.org/10.1109/TIV.2022.3224655

Index Terms

Dynamic scaling for low-precision learning
1. Computing methodologies
  1. Distributed computing methodologies

Recommendations

Auto-Precision Scaling for Distributed Deep Learning
High Performance Computing
Abstract
It has been reported that the communication cost for synchronizing gradients can be a bottleneck, which limits the scalability of distributed deep learning. Using low-precision gradients is a promising technique for reducing the bandwidth ...
Simulating Low Precision Floating-Point Arithmetic

The half-precision (fp16) floating-point format, defined in the 2008 revision of the IEEE standard for floating-point arithmetic, and a more recently proposed half-precision format bfloat16, are increasingly available in GPUs and other accelerators. While the ...
Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on ...

Comments

Information & Contributors

Information

Published In

PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2021

507 pages

ISBN:9781450382946

DOI:10.1145/3437801

General Chair:
Jaejin Lee
Seoul National University, South Korea
,
Program Chair:
Erez Petrank
Technion, Israel

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2021

Check for updates

Badges

Author Tags

Qualifiers

Poster

Conference

PPoPP '21

Sponsor:

PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 27, 2021

Virtual Event, Republic of Korea

Acceptance Rates

PPoPP '21 Paper Acceptance Rate 31 of 150 submissions, 21%;

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
332
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Shen ALai ZSun TLi SGe KLiu WLi D(2025)Efficient deep neural network training via decreasing precision with layer capacityFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40669-319:10Online publication date: 1-Oct-2025
https://dl.acm.org/doi/10.1007/s11704-024-40669-3
Shen ALai ZZhang L(2024)Systematic Analysis of Low-Precision Training in Deep Neural Networks: Factors Influencing Matrix ComputationsApplied Sciences10.3390/app14211002514:21(10025)Online publication date: 2-Nov-2024
https://doi.org/10.3390/app142110025
Chen DZhu YWang DWang HXie JZhang XHan Z(2023)Love of Variety Based Latency Analysis for High Definition Map Updating: Age of Information and Distributional Robust PerspectivesIEEE Transactions on Intelligent Vehicles10.1109/TIV.2022.32246558:2(1751-1764)Online publication date: Feb-2023
https://doi.org/10.1109/TIV.2022.3224655

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cited By

Index Terms

Recommendations

Auto-Precision Scaling for Distributed Deep Learning

Simulating Low Precision Floating-Point Arithmetic

Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent