Abstract
This paper is focused on the improvement of the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation of deep learning (DL) algorithms for GPUs. GPUs are one of the most efficient and commonly used accelerators for deep learning computations. The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution. One of the most common techniques for compressing CNN models is weight pruning. There are two main types of pruning: structural (based on removing whole weight channels) and non-structural (removing individual weights). The first enables much easier acceleration, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural pruning with retraining can generate a matrix-weight up to \({\sim }90\%\) or more of sparsity in some deep CNN models. This work shows when is worth using a direct sparse operation to speed-up the calculation of the convolution layers. The VGG-16, CNN-non-static and \(1 \times 1\) layers from ResNet models were used as a benchmarks. In addition, we present the impact of using reduced precision on time efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adámek, K., Dimoudi, S., Giles, M., Armour, W.: GPU fast convolution via the overlap-and-save method in shared memory (2019)
Al-Hami, M., Pietron, M., Casas, R., Wielgosz, M.: Methodologies of compressing a stable performance convolutional neural networks in image classification, January 2020
Chen, X.: Escoin: efficient sparse convolutional neural network inference on GPUs (2018)
Chetlur, S., et al.: cuDNN: efficient primitives for deep learning (2014)
Dongarra, J.J., Hammarling, S., Higham, N.J., Relton, S.D., Valero-Lara, P., Zounon, M.: The design and performance of batched BLAS on modern high-performance computing systems. In: ICCS (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Jordà, M., Valero-Lara, P., Peña, A.J.: Performance evaluation of cuDNN convolution algorithms on NVIDIA Volta GPUs. IEEE Access 7, 70461–70473 (2019)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751. Association for Computational Linguistics, October 2014
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Neural Inf. Process. Syst. 25, 01 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4013–4021 (2016)
Lee, H., Kwon, H.: Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans. Image Process. 26(10), 4843–4855 (2017)
Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks, pp. 806–814 (2015)
Lu, L., Liang, Y.: SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs. In: 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1–6 (2018)
Park, J., et al.: Faster CNNs with direct sparse convolutions and guided pruning (2016)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. CoRR, abs/1710.05941 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 06 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Winograd, S.: Arithmetic Complexity of Computations. CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (1980)
Wróbel, K., Karwatowski, M., Wielgosz, M., Pietroń, M., Wiatr, K.: Compression of convolutional neural network for natural language processing. Comput. Sci. 21(1) (2020)
Yin, W., Kann, K., Yu, M., Schütze, H.: Comparative study of CNN and RNN for natural language processing (2017)
Zhu, F., Pool, J., Andersch, M., Appleyard, J., Xie, F.: Sparse persistent RNNs: squeezing large recurrent networks on-chip (2018)
Acknowledgment
This work has been supported by the funds provided by AGH University of Science and Technology in 2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Żurek, D., Pietroń, M., Wiatr, K. (2021). Accelerating Deep Convolutional Neural on GPGPU. In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 284. Springer, Cham. https://doi.org/10.1007/978-3-030-80126-7_50
Download citation
DOI: https://doi.org/10.1007/978-3-030-80126-7_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80125-0
Online ISBN: 978-3-030-80126-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)