skip to main content
10.1145/3597031.3597057acmotherconferencesArticle/Chapter ViewAbstractPublication PagesheartConference Proceedingsconference-collections
research-article

cuSCNN : an Efficient CUDA Implementation of Sparse CNNs

Published: 19 July 2023 Publication History

Abstract

Deep Neural Network models are becoming much larger which greatly increases their computation and memory requirements. Sparsity offers great opportunities to reduce unnecessary data transfers and computations. However, exploiting sparsity in CNN inference presents challenges such as irregularities in memory access patterns. To overcome this challenge, we propose cuSCNN, an efficient sparse CNN inference engine that leverages the sparsity of both models and activations using optimized sparse-sparse matrix convolution kernels with compressed operands. cuSCNN is motivated by the concepts introduced by the SCNN hardware accelerator[21] but modified appropriately to achieve an efficient software implementation for GPUs. We develop GPU optimizations that boost execution performance and reduce the required memory size and bandwidth. cuSCNN achieves a speedup of up to 171 × compared to an efficient CPU implementation and 30 × speedup compared to a multi-threaded CPU implementation without batching, enabling the use of inexpensive low-end memory-constrained GPUs to implement large networks with near real-time latency. Although GPU throughput can benefit from larger batch sizes, batch size 1 achieves the lowest latency and hence we focus on it.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: A system for large-scale machine learning. In 12th { USENIX} Symposium on Operating Systems Design and Implementation ({ OSDI} 16). 265–283.
[2]
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. In 2016 IEEE/ACM International Conference on Computer Architecture (ISCA).
[3]
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the Devil in the Details: Delving Deep into Convolutional Nets. CoRR abs/1405.3531 (2014).
[4]
Leiyu Chen, Shaobo Li, Qiang Bai, Jing Yang, Sanlong Jiang, and Yanming Miao. 2021. Review of image classification algorithms based on convolutional neural networks. Remote Sensing 13, 22 (2021), 4712.
[5]
Xuhao Chen. 2019. Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs. Matrix 4, 5 (2019), 7–8.
[6]
Thibaud Ehret and Gabriele Facciolo. 2019. A study of two cnn demosaicking algorithms. Image Processing On Line 9 (2019), 220–230.
[7]
Trevor Gale, Erich Elsen, and Sara Hooker. [n. d.]. The State of Sparsity in Deep Neural Networks. Technical Report. arxiv:1902.09574v1https://bit.ly/2T8hBGn
[8]
Mathew Hall and Vaughn Betz. 2020. HPIPE: Heterogeneous layer-pipelined and sparse-aware CNN inference for FPGAs. arXiv preprint arXiv:2007.10451 (2020).
[9]
Ademola E Ilesanmi and Taiwo O Ilesanmi. 2021. Methods for image denoising using convolutional neural network: a review. Complex & Intelligent Systems 7, 5 (2021), 2179–2198.
[10]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. 675–678.
[11]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States.1106–1114. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
[12]
D. Li and Z. Wang. 2017. Video Superresolution via Motion Compensation and Deep Residual Learning. IEEE Transactions on Computational Imaging 3, 4 (Dec 2017), 749–762. https://doi.org/10.1109/TCI.2017.2671360
[13]
D. Martin, C. Fowlkes, D. Tal, and J. Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2. 416–423 vol.2. https://doi.org/10.1109/ICCV.2001.937655
[14]
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440 (2016).
[15]
Yoshifumi Nakano, Takaaki Saeki, Shinnosuke Takamichi, Katsuhito Sudoh, and Hiroshi Saruwatari. 2023. vTTS: visual-text to speech. In 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 936–942.
[16]
Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G Okuno, and Tetsuya Ogata. 2015. Audio-visual speech recognition using deep learning. Applied intelligence 42 (2015), 722–737.
[17]
CUDA Nvidia. 2008. Cublas library. NVIDIA Corporation, Santa Clara, California 15, 27 (2008), 31.
[18]
CUDA Nvidia. 2014. Cusparse library. NVIDIA Corporation, Santa Clara, California (2014).
[19]
Daniel W Otter, Julian R Medina, and Jugal K Kalita. 2020. A survey of the usages of deep learning for natural language processing. IEEE transactions on neural networks and learning systems 32, 2 (2020), 604–624.
[20]
Niall O’Mahony, Sean Campbell, Anderson Carvalho, Suman Harapanahalli, Gustavo Velasco Hernandez, Lenka Krpalkova, Daniel Riordan, and Joseph Walsh. 2020. Deep learning vs. traditional computer vision. In Advances in Computer Vision: Proceedings of the 2019 Computer Vision Conference (CVC), Volume 1 1. Springer, 128–144.
[21]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (Toronto, ON, Canada) (ISCA ’17). ACM, New York, NY, USA, 27–40. https://doi.org/10.1145/3079856.3080254
[22]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. Scnn: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Computer Architecture News 45, 2 (2017), 27–40.
[23]
Masuma Akter Rumi, Xiaolong Ma, Yanzhi Wang, and Peng Jiang. 2020. Accelerating sparse cnn inference on gpus with performance-aware weight pruning. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 267–278.
[24]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2014. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575 [cs] (Sept. 2014). arXiv:1409.0575.
[25]
Marius Octavian Stan. 2022. HPIPE-NX: Leveraging tensor blocks for high-performance CNN inference acceleration on FPGAs. Ph. D. Dissertation. University of Toronto (Canada).
[26]
Shuoheng Yang, Yuxin Wang, and Xiaowen Chu. 2020. A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526 (2020).
[27]
Zhuliang Yao, Shijie Cao, Wencong Xiao, Chen Zhang, and Lanshun Nie. 2019. Balanced sparsity for efficient dnn inference on gpu. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5676–5683.
[28]
Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. 2017. Learning Deep CNN Denoiser Prior for Image Restoration. In IEEE Conference on Computer Vision and Pattern Recognition. 3929–3938.
[29]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–12.
[30]
Hongyu Zhu, Chao Xie, Yeqi Fei, and Huanjie Tao. 2021. Attention mechanisms in CNN-based single image super-resolution: A brief review and a new perspective. Electronics 10, 10 (2021), 1187.

Index Terms

  1. cuSCNN : an Efficient CUDA Implementation of Sparse CNNs
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          HEART '23: Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
          June 2023
          127 pages
          ISBN:9798400700439
          DOI:10.1145/3597031
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 19 July 2023

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Accelerator
          2. CUDA.
          3. Graphics processing unit (GPU)
          4. Sparse Convolution Neural Network (SCNN)

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          HEART 2023

          Acceptance Rates

          Overall Acceptance Rate 22 of 50 submissions, 44%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 124
            Total Downloads
          • Downloads (Last 12 months)54
          • Downloads (Last 6 weeks)1
          Reflects downloads up to 24 Jan 2025

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media