research-article

Interpretable ML enhanced CNN Performance Analysis of cuBLAS, cuDNN and TensorRT

Authors:

Zhumakhan Nazir,

Vladislav Yarovenko,

Jurn-Gyu ParkAuthors Info & Claims

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

Pages 1260 - 1265

https://doi.org/10.1145/3555776.3578729

Published: 07 June 2023 Publication History

Abstract

Deep learning models such as convolutional neural networks (CNNs) have a wide range of perception applications in image classification and object detection. However, despite the same CNN architectures, inference performance is different from implementations of specific libraries such as cuBLAS, cuDNN, and TensorRT. To investigate the performance effects of the state-of-the-art GPU libraries, this paper performs a case study of comparison and performance analysis of cuBLAS, cuDNN, and TensorRT implementations/libraries on YOLOv4-tiny, introducing crucial nvprof metrics for fair comparison and rationales of different performance and proposing interpretable machine learning (ML) model-based analysis. The results of our interpretable ML models show 100% accuracy in the classification and 0.0094 MAPE in the regression tasks respectively.

References

[1]

Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. 2014. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing 22, 10 (2014), 1533--1545.

Digital Library

[2]

Alexeyab. 2016. Test image used for performance comparison. https://github.com/AlexeyAB/darknet/blob/master/data/dog.jpg.

[3]

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).

[4]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).

[5]

Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence 32, 9 (2010), 1627--1645.

Digital Library

[6]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580--587.

Digital Library

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[8]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[9]

Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Lee. 2017. Performance analysis of CNN frameworks for GPUs. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 55--64.

[10]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84--90.

Digital Library

[11]

Xiaqing Li, Guangyan Zhang, H Howie Huang, Zhufan Wang, and Weimin Zheng. 2016. Performance analysis of GPU-based convolutional neural networks. In 2016 45th International conference on parallel processing (ICPP). IEEE, 67--76.

[12]

Z. Nazir. 2022. CNN Cuda libraries performance. https://github.com/zhumakhan/CNN_Cuda_libraries_performance/blob/master/dataset.csv.

[13]

NVIDIA, Péter Vingelmann, and Frank H.P. Fitzek. 2020. CUDA, release: 10.2.89. https://developer.nvidia.com/cuda-toolkit

[14]

NVIDIA Corporation. 2015. Nvidia GeForce 10 series graphics cards. https://www.nvidia.com/en-eu/geforce/10-series/.

[15]

NVIDIA Corporation. 2015. Nvidia TENSORRT. https://developer.nvidia.com/tensorrt

[16]

NVIDIA Corporation. 2022. Nsight Graphics. https://docs.nvidia.com/nsight-graphics/UserGuide/. https://docs.nvidia.com/nsight-graphics/UserGuide/

[17]

NVIDIA Corporation. 2022. Nvidia Profiler User's Guide. https://docs.nvidia.com/cuda/profiler-users-guide/index.html#metrics-reference.

[18]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830.

[19]

Joseph Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/.

[20]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[21]

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. PMLR, 10347--10357.

[22]

Micaela Verucchi, Gianluca Brilli, Davide Sapienza, Mattia Verasani, Marco Arena, Francesco Gatti, Alessandro Capotondi, Roberto Cavicchioli, Marko Bertogna, and Marco Solieri. 2020. A systematic assessment of embedded neural networks for object detection. In 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vol. 1. IEEE, 937--944.

[23]

Leyuan Wang, Zhi Chen, Yizhi Liu, Yao Wang, Lianmin Zheng, Mu Li, and Yida Wang. 2019. A unified optimization approach for cnn model inference on integrated gpus. In Proceedings of the 48th International Conference on Parallel Processing. 1--10.

Digital Library

[24]

S. Winograd. 1980. Arithmetic complexity of computations. Siam, volume 33 (1980).

[25]

Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923 (2017).

[26]

Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, and Yonghui Wu. 2022. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022).

Cited By

Guerrouj FRodríguez Flórez SEl Ouardi AAbouzahir MRamzi M(2024)Optimizing Convolution Operations for YOLOv4-based Object Detection on GPUITM Web of Conferences10.1051/itmconf/2024690400869(04008)Online publication date: 13-Dec-2024
https://doi.org/10.1051/itmconf/20246904008

Index Terms

Interpretable ML enhanced CNN Performance Analysis of cuBLAS, cuDNN and TensorRT

Index terms have been assigned to the content through auto-classification.

Recommendations

cuConv: CUDA implementation of convolution for CNN inference
Abstract
Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and are largely used in production. State–of–the–...
Out-of-core implementation for accelerator kernels on heterogeneous clouds

Cloud environments today are increasingly featuring hybrid nodes containing multicore CPU processors and a diverse mix of accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors, and Field-Programmable Gate Arrays (FPGAs) to ...
A unified optimizing compiler framework for different GPGPU architectures

This article presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

March 2023

1932 pages

ISBN:9781450395175

DOI:10.1145/3555776

Conference Chairs:
Jiman Hong
Soongsil University, South Korea
,
Maart Lanperne
Tallinn University, Estonia
,
Program Chairs:
Juw Won Park
University of Louisville, USA
,
Tomas Cerny
Baylor University, USA
,
Publication Chair:
Hossain Shahriar
Kennesaw State University, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Nazarbayev University

Conference

SAC '23

Sponsor:

SIGAPP

SAC '23: 38th ACM/SIGAPP Symposium on Applied Computing

March 27 - 31, 2023

Tallinn, Estonia

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
112
Total Downloads

Downloads (Last 12 months)71
Downloads (Last 6 weeks)6

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Guerrouj FRodríguez Flórez SEl Ouardi AAbouzahir MRamzi M(2024)Optimizing Convolution Operations for YOLOv4-based Object Detection on GPUITM Web of Conferences10.1051/itmconf/2024690400869(04008)Online publication date: 13-Dec-2024
https://doi.org/10.1051/itmconf/20246904008

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten