EPMC: efficient parallel memory compression in deep neural network training

Chen, Zailong; Yang, Shenghong; Liu, Chubo; Hu, Yikun; Li, Kenli; Li, Keqin

doi:10.1007/s00521-021-06433-5

EPMC: efficient parallel memory compression in deep neural network training

Original Article
Published: 29 August 2021

Volume 34, pages 757–769, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Zailong Chen¹,
Shenghong Yang¹,
Chubo Liu¹,
Yikun Hu¹,
Kenli Li¹ &
…
Keqin Li²

395 Accesses
1 Citation
Explore all metrics

Abstract

Deep neural networks (DNNs) are getting deeper and larger, making memory become one of the most important bottlenecks during training. Researchers have found that the feature maps generated during DNN training occupy the major portion of memory footprint. To reduce memory demand, they proposed to encode the feature maps in the forward pass and decode them in the backward pass. However, we observe that the execution of encoding and decoding is time-consuming, leading to severe slowdown of the DNN training. To solve this problem, we present an efficient parallel memory compression framework—EPMC, which enables us to simultaneously reduce the memory footprint and the impact of encoding/decoding on DNN training. Our framework employs pipeline parallel optimization and specific-layer parallelism for encoding and decoding to reduce their impact on overall training. It also combines precision reduction with encoding for improving the data compressing ratio. We evaluate EPMC across four state-of-the-art DNNs. Experimental results show that EPMC can reduce the memory footprint during training to 2.3 times on average without accuracy loss. In addition, it can reduce the DNN training time by more than 2.1 times on average compared with the unoptimized encoding/decoding scheme. Moreover, compared with using the common compression scheme Compressed Sparse Row, EPMC can achieve data compression ratio by 2.2 times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

PipeFB: An Optimized Pipeline Parallelism Scheme to Reduce the Peak Memory Usage

Minimizing Off-Chip Memory Access for Deep Convolutional Neural Network Training

Research on Deep Neural Network Model Compression Based on Quantification Pruning and Huffmann Encoding

References

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT press Cambridge, Cambridge
MATH Google Scholar
Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M, Muller U, Zhang J, et al., End to end learning for self-driving cars, arXiv preprint arXiv:1604.07316
Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3476–3483
Weimer D, Scholz-Reiter B, Shpitalni M (2016) Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann 65(1):417–420
Article Google Scholar
Kalchbrenner N, Grefenstette E, Blunsom P, A convolutional neural network for modelling sentences, arXiv preprint arXiv:1404.2188
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1725–1732
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164
LeCun Y, et al (2015) Lenet-5, convolutional neural networks. URL: http://yann.lecun.com/exdb/lenet 20(5): 14
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105
Simonyan K, Zisserman A, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Rhu M, Gimelshein N, Clemons J, Zulfiqar A, Keckler S (2016) vdnn: Virtualized deep neural networks for scalable, memory-efficient neural network design. Memory-Efficient Neural Network Design, MICRO-2016
Rhu M, O’Connor M, Chatterjee N, Pool J, Kwon Y, Keckler SW, Compressing dma engine: leveraging activation sparsity for training deep neural networks. In: (2018) IEEE International symposium on high performance computer architecture (HPCA). IEEE 2018:78–91
Han S, Mao H, Dally WJ, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
He Y, Lin J, Liu Z,Wang H, Li L-J, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV), pp. 784–800
Guo S, Wang Y, Li Q, Yan J (2020) Dmcp: Differentiable markov channel pruning for neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1539–1547
Li Y, Gu S, Mayer C, Gool LV, Timofte R (2020) Group sparsity: the hinge between filter pruning and decomposition for network compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8018–8027
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1529–1538
Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng K-T, Sun J (2019) Metapruning: Meta learning for automatic neural network channel pruning. In: Proceedings of the IEEE international conference on computer vision, pp. 3296–3305
Neill JO, An overview of neural network compression. arXiv preprint arXiv:2006.03669
An overview of model compression techniques for deep learning in space. https://medium.com/gsi-technology/an-overview-of-model-compression-techniques-for-deep-learning-in-space-3fd8d4ce84e5 (2020)
Xu Y, Wang Y, Zhou A, Lin W, Xiong H (2018) Deep neural network compression with single and multiple level quantization. In: Proceedings of the AAAI conference on artificial intelligence. 32
Kim H, Khan MUK, Kyung C-M (2019) Efficient neural network compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12569–12577
Ge S (2018) Efficient deep learning in network compression and acceleration. In: Digital systems, IntechOpen
Paupamah K, James S, Klein R (2020) Quantisation and pruning for neural network compression and regularisation. In: International SAUPEC/RobMech/PRASA Conference. IEEE 2020:1–6
Jin S, Di S, Liang X, Tian J, Tao D, Cappello F (2019) Deepsz: A novel framework to compress deep neural networks by using error-bounded lossy compression. In: Proceedings of the 28th international symposium on high-performance parallel and distributed computing. pp. 159–170
Kozlov A, Lazarevich I, Shamporov V, Lyalyushkin N, Gorbachev Y, Neural network compression framework for fast model inference. arXiv preprint arXiv:2002.08679
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K, Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677
Jain A, Phanishayee A, Mars J, Tang L, Pekhimenko G (2018) Gist: Efficient data encoding for deep neural network training. In: ACM/IEEE 45th annual international symposium on computer architecture (ISCA). IEEE 2018:776–789
Chen T, Xu B, Zhang C, Guestrin C, Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174
Wen W, Xu C, Wu C, Wang Y, Chen Y, Li H (2017) Coordinating filters for faster deep neural networks. In: Proceedings of the IEEE international conference on computer vision. pp. 658–666
Zhu C, Han S, Mao H, Dally WJ, Trained ternary quantization. arXiv preprint arXiv:1612.01064
Liu Z, Wu B, Luo W, Yang X, Liu W, Cheng K-T (2018) Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In: Proceedings of the European conference on computer vision (ECCV), pp. 722–737
Qin H, Gong R, Liu X, Shen M, Wei Z, Yu F, Song J (2020) Forward and backward information retention for accurate binary neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2250–2259
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y, Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9
De Sa C, Feldman M, Ré C, Olukotun K (2017) Understanding and optimizing asynchronous low-precision stochastic gradient descent. In: Proceedings of the 44th annual international symposium on computer architecture, pp. 561–574
Cheng J, Grossman M, McKercher T (2014) Professional CUDA c programming. Wiley, Hoboken
Google Scholar
Nvidia kepler architecture (2012) https://www.nvidia.cn/content/apac/pdf/tesla/nvidia-kepler-gk110-architecture-whitepaper-cn.pdf
Nvidia tesla k40c (2013) https://www.techpowerup.com/gpu-specs/tesla-k40c.c2505
Nvidia geforce gtx 1070 (2016) https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1070/
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Article MathSciNet Google Scholar

Download references

Acknowledgements

The research is partially supported by the Program of National Natural Science Foundation of China (Grant Nos. 62072165, U19A2058), Open Research Projects of Zhejiang Lab (No. 2020KE0AB01), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
Zailong Chen, Shenghong Yang, Chubo Liu, Yikun Hu & Kenli Li
Department of Computer Science, State University of New York, New Paltz, NY, 12561, USA
Keqin Li

Authors

Zailong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shenghong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chubo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yikun Hu
View author publications
You can also search for this author in PubMed Google Scholar
Kenli Li
View author publications
You can also search for this author in PubMed Google Scholar
Keqin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shenghong Yang or Kenli Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Yang, S., Liu, C. et al. EPMC: efficient parallel memory compression in deep neural network training. Neural Comput & Applic 34, 757–769 (2022). https://doi.org/10.1007/s00521-021-06433-5

Download citation

Received: 04 November 2020
Accepted: 17 August 2021
Published: 29 August 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s00521-021-06433-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EPMC: efficient parallel memory compression in deep neural network training

Abstract

Access this article

Similar content being viewed by others

PipeFB: An Optimized Pipeline Parallelism Scheme to Reduce the Peak Memory Usage

Minimizing Off-Chip Memory Access for Deep Convolutional Neural Network Training

Research on Deep Neural Network Model Compression Based on Quantification Pruning and Huffmann Encoding

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

EPMC: efficient parallel memory compression in deep neural network training

Abstract

Access this article

Similar content being viewed by others

PipeFB: An Optimized Pipeline Parallelism Scheme to Reduce the Peak Memory Usage

Minimizing Off-Chip Memory Access for Deep Convolutional Neural Network Training

Research on Deep Neural Network Model Compression Based on Quantification Pruning and Huffmann Encoding

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation