skip to main content
10.1145/3555776.3578729acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Interpretable ML enhanced CNN Performance Analysis of cuBLAS, cuDNN and TensorRT

Published: 07 June 2023 Publication History

Abstract

Deep learning models such as convolutional neural networks (CNNs) have a wide range of perception applications in image classification and object detection. However, despite the same CNN architectures, inference performance is different from implementations of specific libraries such as cuBLAS, cuDNN, and TensorRT. To investigate the performance effects of the state-of-the-art GPU libraries, this paper performs a case study of comparison and performance analysis of cuBLAS, cuDNN, and TensorRT implementations/libraries on YOLOv4-tiny, introducing crucial nvprof metrics for fair comparison and rationales of different performance and proposing interpretable machine learning (ML) model-based analysis. The results of our interpretable ML models show 100% accuracy in the classification and 0.0094 MAPE in the regression tasks respectively.

References

[1]
Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. 2014. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing 22, 10 (2014), 1533--1545.
[2]
Alexeyab. 2016. Test image used for performance comparison. https://github.com/AlexeyAB/darknet/blob/master/data/dog.jpg.
[3]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
[4]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).
[5]
Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence 32, 9 (2010), 1627--1645.
[6]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580--587.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[8]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[9]
Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Lee. 2017. Performance analysis of CNN frameworks for GPUs. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 55--64.
[10]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84--90.
[11]
Xiaqing Li, Guangyan Zhang, H Howie Huang, Zhufan Wang, and Weimin Zheng. 2016. Performance analysis of GPU-based convolutional neural networks. In 2016 45th International conference on parallel processing (ICPP). IEEE, 67--76.
[12]
Z. Nazir. 2022. CNN Cuda libraries performance. https://github.com/zhumakhan/CNN_Cuda_libraries_performance/blob/master/dataset.csv.
[13]
NVIDIA, Péter Vingelmann, and Frank H.P. Fitzek. 2020. CUDA, release: 10.2.89. https://developer.nvidia.com/cuda-toolkit
[14]
NVIDIA Corporation. 2015. Nvidia GeForce 10 series graphics cards. https://www.nvidia.com/en-eu/geforce/10-series/.
[15]
NVIDIA Corporation. 2015. Nvidia TENSORRT. https://developer.nvidia.com/tensorrt
[16]
NVIDIA Corporation. 2022. Nsight Graphics. https://docs.nvidia.com/nsight-graphics/UserGuide/. https://docs.nvidia.com/nsight-graphics/UserGuide/
[17]
NVIDIA Corporation. 2022. Nvidia Profiler User's Guide. https://docs.nvidia.com/cuda/profiler-users-guide/index.html#metrics-reference.
[18]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830.
[19]
Joseph Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/.
[20]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[21]
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. PMLR, 10347--10357.
[22]
Micaela Verucchi, Gianluca Brilli, Davide Sapienza, Mattia Verasani, Marco Arena, Francesco Gatti, Alessandro Capotondi, Roberto Cavicchioli, Marko Bertogna, and Marco Solieri. 2020. A systematic assessment of embedded neural networks for object detection. In 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vol. 1. IEEE, 937--944.
[23]
Leyuan Wang, Zhi Chen, Yizhi Liu, Yao Wang, Lianmin Zheng, Mu Li, and Yida Wang. 2019. A unified optimization approach for cnn model inference on integrated gpus. In Proceedings of the 48th International Conference on Parallel Processing. 1--10.
[24]
S. Winograd. 1980. Arithmetic complexity of computations. Siam, volume 33 (1980).
[25]
Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923 (2017).
[26]
Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, and Yonghui Wu. 2022. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022).

Cited By

View all
  • (2024)Optimizing Convolution Operations for YOLOv4-based Object Detection on GPUITM Web of Conferences10.1051/itmconf/2024690400869(04008)Online publication date: 13-Dec-2024

Index Terms

  1. Interpretable ML enhanced CNN Performance Analysis of cuBLAS, cuDNN and TensorRT
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing
          March 2023
          1932 pages
          ISBN:9781450395175
          DOI:10.1145/3555776
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 07 June 2023

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. deep learning
          2. CNNs
          3. GPU libraries
          4. cuBLAS
          5. cuDNN
          6. TensorRT
          7. interpretable ML

          Qualifiers

          • Research-article

          Funding Sources

          Conference

          SAC '23
          Sponsor:

          Acceptance Rates

          Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

          Upcoming Conference

          SAC '25
          The 40th ACM/SIGAPP Symposium on Applied Computing
          March 31 - April 4, 2025
          Catania , Italy

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)71
          • Downloads (Last 6 weeks)6
          Reflects downloads up to 16 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Optimizing Convolution Operations for YOLOv4-based Object Detection on GPUITM Web of Conferences10.1051/itmconf/2024690400869(04008)Online publication date: 13-Dec-2024

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media