Abstract
Deep neural networks (DNNs) are getting deeper and larger, making memory become one of the most important bottlenecks during training. Researchers have found that the feature maps generated during DNN training occupy the major portion of memory footprint. To reduce memory demand, they proposed to encode the feature maps in the forward pass and decode them in the backward pass. However, we observe that the execution of encoding and decoding is time-consuming, leading to severe slowdown of the DNN training. To solve this problem, we present an efficient parallel memory compression framework—EPMC, which enables us to simultaneously reduce the memory footprint and the impact of encoding/decoding on DNN training. Our framework employs pipeline parallel optimization and specific-layer parallelism for encoding and decoding to reduce their impact on overall training. It also combines precision reduction with encoding for improving the data compressing ratio. We evaluate EPMC across four state-of-the-art DNNs. Experimental results show that EPMC can reduce the memory footprint during training to 2.3 times on average without accuracy loss. In addition, it can reduce the DNN training time by more than 2.1 times on average compared with the unoptimized encoding/decoding scheme. Moreover, compared with using the common compression scheme Compressed Sparse Row, EPMC can achieve data compression ratio by 2.2 times.
Similar content being viewed by others
References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT press Cambridge, Cambridge
Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M, Muller U, Zhang J, et al., End to end learning for self-driving cars, arXiv preprint arXiv:1604.07316
Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3476–3483
Weimer D, Scholz-Reiter B, Shpitalni M (2016) Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann 65(1):417–420
Kalchbrenner N, Grefenstette E, Blunsom P, A convolutional neural network for modelling sentences, arXiv preprint arXiv:1404.2188
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1725–1732
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164
LeCun Y, et al (2015) Lenet-5, convolutional neural networks. URL: http://yann.lecun.com/exdb/lenet 20(5): 14
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105
Simonyan K, Zisserman A, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Rhu M, Gimelshein N, Clemons J, Zulfiqar A, Keckler S (2016) vdnn: Virtualized deep neural networks for scalable, memory-efficient neural network design. Memory-Efficient Neural Network Design, MICRO-2016
Rhu M, O’Connor M, Chatterjee N, Pool J, Kwon Y, Keckler SW, Compressing dma engine: leveraging activation sparsity for training deep neural networks. In: (2018) IEEE International symposium on high performance computer architecture (HPCA). IEEE 2018:78–91
Han S, Mao H, Dally WJ, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
He Y, Lin J, Liu Z,Wang H, Li L-J, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV), pp. 784–800
Guo S, Wang Y, Li Q, Yan J (2020) Dmcp: Differentiable markov channel pruning for neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1539–1547
Li Y, Gu S, Mayer C, Gool LV, Timofte R (2020) Group sparsity: the hinge between filter pruning and decomposition for network compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8018–8027
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1529–1538
Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng K-T, Sun J (2019) Metapruning: Meta learning for automatic neural network channel pruning. In: Proceedings of the IEEE international conference on computer vision, pp. 3296–3305
Neill JO, An overview of neural network compression. arXiv preprint arXiv:2006.03669
An overview of model compression techniques for deep learning in space. https://medium.com/gsi-technology/an-overview-of-model-compression-techniques-for-deep-learning-in-space-3fd8d4ce84e5 (2020)
Xu Y, Wang Y, Zhou A, Lin W, Xiong H (2018) Deep neural network compression with single and multiple level quantization. In: Proceedings of the AAAI conference on artificial intelligence. 32
Kim H, Khan MUK, Kyung C-M (2019) Efficient neural network compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12569–12577
Ge S (2018) Efficient deep learning in network compression and acceleration. In: Digital systems, IntechOpen
Paupamah K, James S, Klein R (2020) Quantisation and pruning for neural network compression and regularisation. In: International SAUPEC/RobMech/PRASA Conference. IEEE 2020:1–6
Jin S, Di S, Liang X, Tian J, Tao D, Cappello F (2019) Deepsz: A novel framework to compress deep neural networks by using error-bounded lossy compression. In: Proceedings of the 28th international symposium on high-performance parallel and distributed computing. pp. 159–170
Kozlov A, Lazarevich I, Shamporov V, Lyalyushkin N, Gorbachev Y, Neural network compression framework for fast model inference. arXiv preprint arXiv:2002.08679
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K, Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677
Jain A, Phanishayee A, Mars J, Tang L, Pekhimenko G (2018) Gist: Efficient data encoding for deep neural network training. In: ACM/IEEE 45th annual international symposium on computer architecture (ISCA). IEEE 2018:776–789
Chen T, Xu B, Zhang C, Guestrin C, Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174
Wen W, Xu C, Wu C, Wang Y, Chen Y, Li H (2017) Coordinating filters for faster deep neural networks. In: Proceedings of the IEEE international conference on computer vision. pp. 658–666
Zhu C, Han S, Mao H, Dally WJ, Trained ternary quantization. arXiv preprint arXiv:1612.01064
Liu Z, Wu B, Luo W, Yang X, Liu W, Cheng K-T (2018) Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In: Proceedings of the European conference on computer vision (ECCV), pp. 722–737
Qin H, Gong R, Liu X, Shen M, Wei Z, Yu F, Song J (2020) Forward and backward information retention for accurate binary neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2250–2259
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y, Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9
De Sa C, Feldman M, Ré C, Olukotun K (2017) Understanding and optimizing asynchronous low-precision stochastic gradient descent. In: Proceedings of the 44th annual international symposium on computer architecture, pp. 561–574
Cheng J, Grossman M, McKercher T (2014) Professional CUDA c programming. Wiley, Hoboken
Nvidia kepler architecture (2012) https://www.nvidia.cn/content/apac/pdf/tesla/nvidia-kepler-gk110-architecture-whitepaper-cn.pdf
Nvidia tesla k40c (2013) https://www.techpowerup.com/gpu-specs/tesla-k40c.c2505
Nvidia geforce gtx 1070 (2016) https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1070/
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Acknowledgements
The research is partially supported by the Program of National Natural Science Foundation of China (Grant Nos. 62072165, U19A2058), Open Research Projects of Zhejiang Lab (No. 2020KE0AB01), and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, Z., Yang, S., Liu, C. et al. EPMC: efficient parallel memory compression in deep neural network training. Neural Comput & Applic 34, 757–769 (2022). https://doi.org/10.1007/s00521-021-06433-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06433-5