Skip to main content

Accelerating Deep Convolutional Neural on GPGPU

  • Conference paper
  • First Online:
Intelligent Computing

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 284))

  • 1988 Accesses

Abstract

This paper is focused on the improvement of the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation of deep learning (DL) algorithms for GPUs. GPUs are one of the most efficient and commonly used accelerators for deep learning computations. The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution. One of the most common techniques for compressing CNN models is weight pruning. There are two main types of pruning: structural (based on removing whole weight channels) and non-structural (removing individual weights). The first enables much easier acceleration, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural pruning with retraining can generate a matrix-weight up to \({\sim }90\%\) or more of sparsity in some deep CNN models. This work shows when is worth using a direct sparse operation to speed-up the calculation of the convolution layers. The VGG-16, CNN-non-static and \(1 \times 1\) layers from ResNet models were used as a benchmarks. In addition, we present the impact of using reduced precision on time efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://developer.nvidia.com/cudnn.

  2. 2.

    https://developer.nvidia.com/cublas.

  3. 3.

    https://docs.nvidia.com/cuda/cusparse.

  4. 4.

    http://image-net.org/challenges/LSVRC/2014/.

  5. 5.

    https://docs.nvidia.com/cuda/cuda-math-api/.

  6. 6.

    https://www.nvidia.com/en-us/data-center/v100/.

References

  1. Adámek, K., Dimoudi, S., Giles, M., Armour, W.: GPU fast convolution via the overlap-and-save method in shared memory (2019)

    Google Scholar 

  2. Al-Hami, M., Pietron, M., Casas, R., Wielgosz, M.: Methodologies of compressing a stable performance convolutional neural networks in image classification, January 2020

    Google Scholar 

  3. Chen, X.: Escoin: efficient sparse convolutional neural network inference on GPUs (2018)

    Google Scholar 

  4. Chetlur, S., et al.: cuDNN: efficient primitives for deep learning (2014)

    Google Scholar 

  5. Dongarra, J.J., Hammarling, S., Higham, N.J., Relton, S.D., Valero-Lara, P., Zounon, M.: The design and performance of batched BLAS on modern high-performance computing systems. In: ICCS (2017)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  7. Jordà, M., Valero-Lara, P., Peña, A.J.: Performance evaluation of cuDNN convolution algorithms on NVIDIA Volta GPUs. IEEE Access 7, 70461–70473 (2019)

    Article  Google Scholar 

  8. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751. Association for Computational Linguistics, October 2014

    Google Scholar 

  9. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Neural Inf. Process. Syst. 25, 01 (2012)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  11. Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4013–4021 (2016)

    Google Scholar 

  12. Lee, H., Kwon, H.: Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans. Image Process. 26(10), 4843–4855 (2017)

    Article  MathSciNet  Google Scholar 

  13. Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks, pp. 806–814 (2015)

    Google Scholar 

  14. Lu, L., Liang, Y.: SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs. In: 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1–6 (2018)

    Google Scholar 

  15. Park, J., et al.: Faster CNNs with direct sparse convolutions and guided pruning (2016)

    Google Scholar 

  16. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)

    Google Scholar 

  17. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. CoRR, abs/1710.05941 (2017)

    Google Scholar 

  18. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 06 (2015)

    Google Scholar 

  19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  20. Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)

    Google Scholar 

  21. Winograd, S.: Arithmetic Complexity of Computations. CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (1980)

    Google Scholar 

  22. Wróbel, K., Karwatowski, M., Wielgosz, M., Pietroń, M., Wiatr, K.: Compression of convolutional neural network for natural language processing. Comput. Sci. 21(1) (2020)

    Google Scholar 

  23. Yin, W., Kann, K., Yu, M., Schütze, H.: Comparative study of CNN and RNN for natural language processing (2017)

    Google Scholar 

  24. Zhu, F., Pool, J., Andersch, M., Appleyard, J., Xie, F.: Sparse persistent RNNs: squeezing large recurrent networks on-chip (2018)

    Google Scholar 

Download references

Acknowledgment

This work has been supported by the funds provided by AGH University of Science and Technology in 2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominik Żurek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Żurek, D., Pietroń, M., Wiatr, K. (2021). Accelerating Deep Convolutional Neural on GPGPU. In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 284. Springer, Cham. https://doi.org/10.1007/978-3-030-80126-7_50

Download citation

Publish with us

Policies and ethics