research-article

Performance Modeling of Computer Vision-based CNN on Edge GPUs

Authors:
Halima Bouzidi

Université Polytechnique Hauts-de-France, LAMIH/CNRS, Valenciennes CEDEX 9

Université Polytechnique Hauts-de-France, LAMIH/CNRS, Valenciennes CEDEX 9

0000-0002-1885-6080
View Profile

,
Hamza Ouarnoughi

Université Polytechnique Hauts-de-France, LAMIH/CNRS, INSA Hauts-de-France, Valenciennes CEDEX 9

Université Polytechnique Hauts-de-France, LAMIH/CNRS, INSA Hauts-de-France, Valenciennes CEDEX 9
View Profile

,
Smail Niar

Université Polytechnique Hauts-de-France, LAMIH/CNRS, INSA Hauts-de-France, Valenciennes CEDEX 9

Université Polytechnique Hauts-de-France, LAMIH/CNRS, INSA Hauts-de-France, Valenciennes CEDEX 9
View Profile

,
Abdessamad Ait El Cadi

Université Polytechnique Hauts-de-France, LAMIH/CNRS, INSA Hauts-de-France, Valenciennes CEDEX 9

Université Polytechnique Hauts-de-France, LAMIH/CNRS, INSA Hauts-de-France, Valenciennes CEDEX 9
View Profile

Authors Info & Claims

ACM Transactions on Embedded Computing Systems Volume 21 Issue 5Article No.: 64pp 1–33https://doi.org/10.1145/3527169

Published:08 October 2022Publication History

ACM Transactions on Embedded Computing Systems

Abstract

Convolutional Neural Networks (CNNs) are currently widely used in various fields, particularly for computer vision applications. Edge platforms have drawn tremendous attention from academia and industry due to their ability to improve execution time and preserve privacy. However, edge platforms struggle to satisfy CNNs’ needs due to their computation and energy constraints. Thus, it is challenging to find the most efficient CNN that respects accuracy, time, energy, and memory footprint constraints for a target edge platform. Furthermore, given the size of the design space of CNNs and hardware platforms, performance evaluation of CNNs entails several efforts. Consequently, designers need tools to quickly explore large design space and select the CNN that offers the best performance trade-off for a set of hardware platforms. This article proposes a Machine Learning (ML)–based modeling approach for CNN performances on edge GPU-based platforms for vision applications. We implement and compare five of the most successful ML algorithms for accurate and rapid CNN performance predictions on three different edge GPUs in image classification. Experimental results demonstrate the robustness and usefulness of our proposed methodology. For three of the five ML algorithms — XGBoost, Random Forest, and Ridge Polynomial regression — average errors of 11%, 6%, and 8% have been obtained for CNN inference execution time, power consumption, and memory usage, respectively.

REFERENCES

[1] Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, Ghemawat Sanjay, Irving Geoffrey, Isard Michael, Kudlur Manjunath, Levenberg Josh, Monga Rajat, Moore Sherry, Murray Derek G., Steiner Benoit, Tucker Paul, Vasudevan Vijay, Warden Pete, Wicke Martin, Yu Yuan, and Zheng Xiaoqiang. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Savannah, GA, 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.Google Scholar
[2] Abdelfattah Mohamed S., Dudziak Łukasz, Chau Thomas, Lee Royson, Kim Hyeji, and Lane Nicholas D.. 2020. Best of both worlds: AutoML codesign of a CNN and its hardware accelerator. In 57th ACM/IEEE Design Automation Conference (DAC’20). 1–6. Google ScholarCross Ref
[3] Abdi Hervé. 2007. The Kendall rank correlation coefficient. Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, CA (2007), 508–510.Google Scholar
[4] Amarís Marcos, Camargo Raphael Y. de, Dyab Mohamed, Goldman Alfredo, and Trystram Denis. 2016. A comparison of GPU execution time prediction using machine learning and analytical modeling. In IEEE 15th International Symposium on Network Computing and Applications (NCA’16). 326–333. Google ScholarCross Ref
[5] Arafa Yehia, Badawy Abdel-Hameed, Chennupati Gopinath, Barai Atanu, Santhi Nandakishore, and Eidenbenz Stephan. 2020. Fast, Accurate, and Scalable Memory Modeling of GPGPUs Using Reuse Profiles. Association for Computing Machinery, New York, NY. Google ScholarDigital Library
[6] Awad Mariette and Khanna Rahul. 2015. Support vector regression. In Efficient Learning Machines. Springer, 67–80.Google ScholarCross Ref
[7] Benesty Jacob, Chen Jingdong, Huang Yiteng, and Cohen Israel. 2009. Pearson Correlation Coefficient. Springer, Berlin, Heidelberg,1–4. Google ScholarCross Ref
[8] Benmeziane Hadjer, Maghraoui Kaoutar El, Ouarnoughi Hamza, Niar Smail, Wistuba Martin, and Wang Naigang. 2021. A comprehensive survey on hardware-aware neural architecture search. arXiv:2101.09336 [cs.LG].Google Scholar
[9] Bianco Simone, Cadene Remi, Celona Luigi, and Napoletano Paolo. 2018. Benchmark analysis of representative deep neural network architectures. IEEE Access 6 (2018), 64270–64277. Google ScholarCross Ref
[10] Cai Ermao, Juan Da-Cheng, Stamoulis Dimitrios, and Marculescu Diana. 2017. NeuralPower: Predict and deploy energy-efficient convolutional neural networks. In Proceedings of the 9th Asian Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 77), Zhang Min-Ling and Noh Yung-Kyun (Eds.). PMLR, Yonsei University, Seoul, Republic of Korea, 622–637. https://proceedings.mlr.press/v77/cai17a.html.Google Scholar
[11] Cai Han, Gan Chuang, Zhu Ligeng, and Han Song. 2020. TinyTL: Reduce memory, not parameters for efficient on-device learning. In Advances in Neural Information Processing Systems, Larochelle H., Ranzato M., Hadsell R., Balcan M. F., and Lin H. (Eds.), Vol. 33. Curran Associates, Inc., 11285–11297. https://proceedings.neurips.cc/paper/2020/file/81f7acabd411274fcf65ce2070ed568a-Paper.pdf.Google Scholar
[12] Chen Jiasi and Ran Xukan. 2019. Deep learning with edge computing: A review. Proceedings of IEEE 107, 8 (2019), 1655–1674. Google ScholarCross Ref
[13] Chen Tianqi and Guestrin Carlos. 2016. XGBoost: A scalable tree boosting system(KDD’16). ACM, New York, NY, 785—794. Google ScholarDigital Library
[14] Chen Yunpeng, Li Jianan, Xiao Huaxin, Jin Xiaojie, Yan Shuicheng, and Feng Jiashi. 2017. Dual path networks. In Advances in Neural Information Processing Systems, Guyon I., Luxburg U. V., Bengio S., Wallach H., Fergus R., Vishwanathan S., and Garnett R. (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/f7e0b956540676a129760a3eae309294-Paper.pdf.Google Scholar
[15] Chetlur Sharan, Woolley Cliff, Vandermersch Philippe, Cohen Jonathan, Tran John, Catanzaro Bryan, and Shelhamer Evan. 2014. CuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).Google Scholar
[16] Myttenaere Arnaud de, Golden Boris, Grand Bénédicte Le, and Rossi Fabrice. 2016. Mean absolute percentage error for regression models. Neurocomputing 192 (2016), 38–48. Google ScholarCross Ref
[17] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. 248–255. Google ScholarCross Ref
[18] Deng Lei, Li Guoqi, Han Song, Shi Luping, and Xie Yuan. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc IEEE 108, 4 (2020), 485–532. Google ScholarCross Ref
[19] Gholami Amir, Kwon Kiseok, Wu Bichen, Tai Zizheng, Yue Xiangyu, Jin Peter, Zhao Sicheng, and Keutzer Kurt. 2018. Squeezenext: Hardware-aware neural network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, Utah, USA. 1638–1647.Google ScholarCross Ref
[20] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA. 770–778.Google ScholarCross Ref
[21] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Identity mappings in deep residual networks. In European Conference on Computer Vision, Las Vegas, Nevada, USA. Springer, 630–645.Google ScholarCross Ref
[22] Hearst M. A., Dumais S. T., Osuna E., Platt J., and Scholkopf B.. 1998. Support vector machines. IEEE Intelligent Systems and their Applications 13, 4 (1998), 18–28. Google ScholarDigital Library
[23] Hoerl Arthur E. and Kennard Robert W.. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1 (1970), 55–67. Google ScholarCross Ref
[24] Hosseini Morteza, Ebrahimabadi Mohammad, Mazumder Arnab Neelim, Homayoun Houman, and Mohsenin Tinoosh. 2021. A fast method to fine-tune neural networks for the least energy consumption on FPGAs. UMBC Student Collection (2021).Google Scholar
[25] Howard Andrew G., Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, and Adam Hartwig. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
[26] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google ScholarCross Ref
[27] Huang Gao, Liu Zhuang, Maaten Laurens Van Der, and Weinberger Kilian Q.. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA. 4700–4708.Google ScholarCross Ref
[28] Ignatov Andrey, Timofte Radu, Chou William, Wang Ke, Wu Max, Hartley Tim, and Gool Luc Van. 2018. AI benchmark: Running deep neural networks on Android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV’18) Workshops, Munich, Germany. 1–8.Google Scholar
[29] Janai Joel, Güney Fatma, Behl Aseem, Geiger Andreas, et al. 2020. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Foundations and Trends® in Computer Graphics and Vision 12, 1–3 (2020), 1–308.Google ScholarDigital Library
[30] Jo Jongmin, Jeong Sucheol, and Kang Pilsung. 2020. Benchmarking GPU-accelerated edge devices. In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp’20), Busan, Korea South. 117–120. Google ScholarCross Ref
[31] Justus Daniel, Brennan John, Bonner Stephen, and McGough Andrew Stephen. 2018. Predicting the computational cost of deep learning models. In 2018 IEEE International Conference on Big Data (Big Data’18), Seattle, WA, USA. 3873–3882. Google ScholarCross Ref
[32] Khan Asifullah, Sohail Anabia, Zahoora Umme, and Qureshi Aqsa Saeed. 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53, 8 (2020), 5455–5516.Google ScholarDigital Library
[33] Kiani Mohsen and Rajabzadeh Amir. 2021. SDAM: A combined stack distance-analytical modeling approach to estimate memory performance in GPUs. The Journal of Supercomputing 77, 5 (2021), 5120–5147.Google ScholarCross Ref
[34] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Pereira F., Burges C. J. C., Bottou L., and Weinberger K. Q. (Eds.), Vol. 25. Curran Associates, Inc.Google ScholarDigital Library
[35] Lee Yen-Lin, Tsung Pei-Kuei, and Wu Max. 2018. Techology trend of edge AI. In 2018 International Symposium on VLSI Design, Automation and Test (VLSI-DAT’18), Hsinchu, Taiwan. 1–2. Google ScholarCross Ref
[36] Li Cheng, Dakkak Abdul, Xiong Jinjun, Wei Wei, Xu Lingjie, and Hwu Wen-mei. 2020. XSP: Across-stack profiling and analysis of machine learning models on GPUs. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS’20), New Orleans, LA, USA. 326–327. Google ScholarCross Ref
[37] Liaw Andy and Wiener. Matthew2002. Classification and regression by randomForest. R News 2, 3 (2002), 18–22.Google Scholar
[38] Lin Mingbao, Ji Rongrong, Wang Yan, Zhang Yichen, Zhang Baochang, Tian Yonghong, and Shao Ling. 2020. HRank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual. 1529–1538.Google ScholarCross Ref
[39] Lin Mingbao, Ji Rongrong, Zhang Yuxin, Zhang Baochang, Wu Yongjian, and Tian Yonghong. 2020. Channel pruning via automatic structure search. arXiv preprint arXiv:2001.08565 (2020).Google Scholar
[40] Liu Peiye, Wu Bo, Ma Huadong, and Seok Mingoo. 2019. MemNet: Memory-efficiency guided neural architecture search with augment-trim learning. arXiv preprint arXiv:1907.09569 (2019).Google Scholar
[41] Lu Zongqing, Rallapalli Swati, Chan Kevin, and Porta Thomas La. 2017. Modeling the resource requirements of convolutional neural networks on mobile devices. In Proceedings of the 25th ACM International Conference on Multimedia (Mountain View, CA) (MM’17). ACM, New York, NY, 1663—1671. Google ScholarDigital Library
[42] Ma Yufei, Cao Yu, Vrudhula Sarma, and Seo Jae-Sun. 2020. Performance modeling for CNN inference accelerators on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 4 (2020), 843–856. Google ScholarCross Ref
[43] Manasi Susmita Dey and Sapatnekar Sachin S.. 2021. DeepOpt: Optimized scheduling of CNN workloads for ASIC-based systolic deep learning accelerators. In Proceedings of the 26th Asia and South Pacific Design Automation Conference (Tokyo, Japan) (ASPDAC’21). ACM, New York, NY, 235—241. Google ScholarDigital Library
[44] Mu Jiandong, Zhang Wei, Liang Hao, and Sinha Sharad. 2020. Optimizing OpenCL-Based CNN design on FPGA with comprehensive design space exploration and collaborative performance modeling. ACM Transactions on Reconfigurable Technology and Systems 13, 3, Article 13 (Jun 2020), 28 pages. Google ScholarDigital Library
[45] Murtagh Fionn. 1991. Multilayer perceptrons for classification and regression. Neurocomputing 2, 5 (1991), 183–197. Google ScholarCross Ref
[46] Nogueira Paulo Eduardo, Matias Rivalino, and Vicente Elder. 2014. An experimental study on execution time variation in computer experiments. In Proceedings of the 29th Annual ACM Symposium on Applied Computing (Gyeongju, Republic of Korea) (SAC’14). ACM, New York, NY, 1529—1534. Google ScholarDigital Library
[47] NVIDIA. 2007. NVIDIA Profiler (nvprof). Retrieved June 30, 2020 from https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview.Google Scholar
[48] NVIDIA. 2019. Tegrastats Utility. Retrieved December 01, 2020 from https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra.Google Scholar
[49] Ostertagová Eva. 2012. Modelling using polynomial regression. Procedia Engineering 48 (2012), 500–506. Google ScholarCross Ref
[50] Qi Hang, Sparks Evan R., and Talwalkar Ameet. 2017. PALEO: A performance model for deep neural networks.Google Scholar
[51] Qin Haotong, Gong Ruihao, Liu Xianglong, Bai Xiao, Song Jingkuan, and Sebe Nicu. 2020. Binary neural networks: A survey. Pattern Recognition 105 (2020), 107281. Google ScholarCross Ref
[52] Rodrigues Crefeda Faviola, Riley Graham, and Luján Mikel. 2017. Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1. In 2017 IEEE International Symposium on Workload Characterization (IISWC’17). 114–115. Google ScholarCross Ref
[53] Rodrigues Crefeda Faviola, Riley Graham, and Luján Mikel. 2017. Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1. In 2017 IEEE International Symposium on Workload Characterization (IISWC’17). 114–115. Google ScholarCross Ref
[54] Rodriguez Juan D., Perez Aritz, and Lozano Jose A.. 2010. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 3 (2010), 569–575. Google ScholarDigital Library
[55] Shi Shaohuai, Wang Qiang, and Chu Xiaowen. 2018. Performance modeling and evaluation of distributed deep learning frameworks on GPUs. In IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech’18). Athens, Greece. 949–957. Google ScholarCross Ref
[56] Siu Kevin, Stuart Dylan Malone, Mahmoud Mostafa, and Moshovos Andreas. 2018. Memory requirements for convolutional neural network hardware accelerators. In 2018 IEEE International Symposium on Workload Characterization (IISWC’18). Raleigh, NC, USA. 111–121. Google ScholarCross Ref
[57] Stamoulis Dimitrios, Cai Ermao, Juan Da-Cheng, and Marculescu Diana. 2018. HyperPower: Power- and memory-constrained hyper-parameter optimization for neural networks. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE’18). Dresden, Germany. 19–24. Google ScholarCross Ref
[58] Sun Qi, Chen Tinghuan, Miao Jin, and Yu Bei. 2019. Power-driven DNN dataflow optimization on FPGA. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’19). Westminster, CO, USA. 1–7. Google ScholarCross Ref
[59] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, USA. 1–9.Google ScholarCross Ref
[60] Tan Mingxing, Chen Bo, Pang Ruoming, Vasudevan Vijay, Sandler Mark, Howard Andrew, and Le Quoc V.. 2019. MNASNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, California, USA. 2820–2828.Google ScholarCross Ref
[61] Tan Mingxing and Le Quoc V.. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019).Google Scholar
[62] Vanholder Han. 2016. Efficient Inference with TensorRT. Retrieved June 30, 2020 from https://developer.nvidia.com/tensorrt.Google Scholar
[63] Velasco-Montero Delia, Fernández-Berni Jorge, Carmona-Galán Ricardo, and Rodráguez-Vázquez Ángel. 2018. Optimum selection of DNN model and framework for edge inference. IEEE Access 6 (2018), 51680–51692. Google ScholarCross Ref
[64] Velasco-Montero Delia, Fernández-Berni Jorge, Carmona-Galán Ricardo, and Rodráguez-Vázquez Ángel. 2020. PreVIous: A methodology for prediction of visual inference performance on IoT devices. IEEE Internet of Things Journal 7, 10 (2020), 9227–9240. Google ScholarCross Ref
[65] Wang Mengdi, Meng Chen, Long Guoping, Wu Chuan, Yang Jun, Lin Wei, and Jia Yangqing. 2019. Characterizing deep learning training workloads on Alibaba-PAI. In 2019 IEEE International Symposium on Workload Characterization (IISWC’19). Orlando, FL, USA. 189–202. Google ScholarCross Ref
[66] Wang Yu Emma, Wei Gu-Yeon, and Brooks David. 2019. Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).Google Scholar
[67] Wu Bichen, Dai Xiaoliang, Zhang Peizhao, Wang Yanghan, Sun Fei, Wu Yiming, Tian Yuandong, Vajda Peter, Jia Yangqing, and Keutzer Kurt. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, California, USA. 10734–10742.Google ScholarCross Ref
[68] Wu Bichen, Iandola Forrest, Jin Peter H., and Keutzer Kurt. 2017. SqueezeDet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, Hawaii, USA. 129–137.Google ScholarCross Ref
[69] Zhang Huaizheng, Huang Yizheng, Wen Yonggang, Yin Jianxiong, and Guan Kyle. 2020. InferBench: Understanding deep learning inference serving with an automatic benchmarking system. arXiv preprint arXiv:2011.02327 (2020).Google Scholar
[70] Zhang Xiaofan, Ye Hanchen, Wang Junsong, Lin Yonghua, Xiong Jinjun, Hwu Wen-mei, and Chen Deming. 2020. DNNExplorer: A framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator(ICCAD’20). ACM, New York, NY, Article 61, 9 pages. Google ScholarDigital Library
[71] Zhang Xiangyu, Zhou Xinyu, Lin Mengxiao, and Sun Jian. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA. 6848–6856.Google ScholarCross Ref
[72] Zoph Barret, Vasudevan Vijay, Shlens Jonathon, and Le Quoc V.. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA. IEEE, 8697–8710.Google ScholarCross Ref
[73] Zou Junhua, Rui Ting, Zhou You, Yang Chengsong, and Zhang Sai. 2018. Convolutional neural network simplification via feature map pruning. Computers & Electrical Engineering 70 (2018), 950–958. Google ScholarCross Ref

Index Terms

Performance Modeling of Computer Vision-based CNN on Edge GPUs

Recommendations

Performance prediction for convolutional neural networks on edge GPUs
CF '21: Proceedings of the 18th ACM International Conference on Computing Frontiers

Edge computing is increasingly used for Artificial Intelligence (AI) purposes to meet latency, privacy, and energy challenges. Convolutional Neural networks (CNN) are more frequently deployed on Edge devices for several applications. However, due to ...
Read More
Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

The Convolutional Neural Network (CNN) kernel is a fundamental building block for deep learning, which dominates the computational cost of deep learning pipelines for image analysis. The synthesis of high-performance GPU kernels for CNNs is thus of ...
Read More
CNN-based language and interpreter for image processing on GPUs

The inherent massive parallelism of cellular neural networks (CNNs) makes them an ideal computational platform for kernel-based algorithms and image processing. General-purpose graphics processing units (GPUs) provide similar massive parallelism, but it ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Embedded Computing Systems Volume 21, Issue 5
September 2022
526 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3561947
Editor:
Tulika Mitra
National University of Singapore, Singapore
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 8 October 2022
- Online AM: 26 March 2022
- Accepted: 14 March 2022
- Revised: 13 February 2022
- Received: 16 July 2021
Published in tecs Volume 21, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Performance modeling
CNN
edge GPU
execution time
power consumption
memory usage
machine learning
regression analysis
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 560
  Total Downloads
- Downloads (Last 12 months)213
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Performance Modeling of Computer Vision-based CNN on Edge GPUs

ACM Transactions on Embedded Computing Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Performance prediction for convolutional neural networks on edge GPUs

Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs

CNN-based language and interpreter for image processing on GPUs