skip to main content
research-article

Performance Modeling of Computer Vision-based CNN on Edge GPUs

Authors Info & Claims
Published:08 October 2022Publication History
Skip Abstract Section

Abstract

Convolutional Neural Networks (CNNs) are currently widely used in various fields, particularly for computer vision applications. Edge platforms have drawn tremendous attention from academia and industry due to their ability to improve execution time and preserve privacy. However, edge platforms struggle to satisfy CNNs’ needs due to their computation and energy constraints. Thus, it is challenging to find the most efficient CNN that respects accuracy, time, energy, and memory footprint constraints for a target edge platform. Furthermore, given the size of the design space of CNNs and hardware platforms, performance evaluation of CNNs entails several efforts. Consequently, designers need tools to quickly explore large design space and select the CNN that offers the best performance trade-off for a set of hardware platforms. This article proposes a Machine Learning (ML)–based modeling approach for CNN performances on edge GPU-based platforms for vision applications. We implement and compare five of the most successful ML algorithms for accurate and rapid CNN performance predictions on three different edge GPUs in image classification. Experimental results demonstrate the robustness and usefulness of our proposed methodology. For three of the five ML algorithms — XGBoost, Random Forest, and Ridge Polynomial regression — average errors of 11%, 6%, and 8% have been obtained for CNN inference execution time, power consumption, and memory usage, respectively.

REFERENCES

  1. [1] Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, Ghemawat Sanjay, Irving Geoffrey, Isard Michael, Kudlur Manjunath, Levenberg Josh, Monga Rajat, Moore Sherry, Murray Derek G., Steiner Benoit, Tucker Paul, Vasudevan Vijay, Warden Pete, Wicke Martin, Yu Yuan, and Zheng Xiaoqiang. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Savannah, GA, 265283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.Google ScholarGoogle Scholar
  2. [2] Abdelfattah Mohamed S., Dudziak Łukasz, Chau Thomas, Lee Royson, Kim Hyeji, and Lane Nicholas D.. 2020. Best of both worlds: AutoML codesign of a CNN and its hardware accelerator. In 57th ACM/IEEE Design Automation Conference (DAC’20). 16. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Abdi Hervé. 2007. The Kendall rank correlation coefficient. Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, CA (2007), 508510.Google ScholarGoogle Scholar
  4. [4] Amarís Marcos, Camargo Raphael Y. de, Dyab Mohamed, Goldman Alfredo, and Trystram Denis. 2016. A comparison of GPU execution time prediction using machine learning and analytical modeling. In IEEE 15th International Symposium on Network Computing and Applications (NCA’16). 326333. Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Arafa Yehia, Badawy Abdel-Hameed, Chennupati Gopinath, Barai Atanu, Santhi Nandakishore, and Eidenbenz Stephan. 2020. Fast, Accurate, and Scalable Memory Modeling of GPGPUs Using Reuse Profiles. Association for Computing Machinery, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Awad Mariette and Khanna Rahul. 2015. Support vector regression. In Efficient Learning Machines. Springer, 6780.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Benesty Jacob, Chen Jingdong, Huang Yiteng, and Cohen Israel. 2009. Pearson Correlation Coefficient. Springer, Berlin, Heidelberg,14. Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Benmeziane Hadjer, Maghraoui Kaoutar El, Ouarnoughi Hamza, Niar Smail, Wistuba Martin, and Wang Naigang. 2021. A comprehensive survey on hardware-aware neural architecture search. arXiv:2101.09336 [cs.LG].Google ScholarGoogle Scholar
  9. [9] Bianco Simone, Cadene Remi, Celona Luigi, and Napoletano Paolo. 2018. Benchmark analysis of representative deep neural network architectures. IEEE Access 6 (2018), 6427064277. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cai Ermao, Juan Da-Cheng, Stamoulis Dimitrios, and Marculescu Diana. 2017. NeuralPower: Predict and deploy energy-efficient convolutional neural networks. In Proceedings of the 9th Asian Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 77), Zhang Min-Ling and Noh Yung-Kyun (Eds.). PMLR, Yonsei University, Seoul, Republic of Korea, 622637. https://proceedings.mlr.press/v77/cai17a.html.Google ScholarGoogle Scholar
  11. [11] Cai Han, Gan Chuang, Zhu Ligeng, and Han Song. 2020. TinyTL: Reduce memory, not parameters for efficient on-device learning. In Advances in Neural Information Processing Systems, Larochelle H., Ranzato M., Hadsell R., Balcan M. F., and Lin H. (Eds.), Vol. 33. Curran Associates, Inc., 1128511297. https://proceedings.neurips.cc/paper/2020/file/81f7acabd411274fcf65ce2070ed568a-Paper.pdf.Google ScholarGoogle Scholar
  12. [12] Chen Jiasi and Ran Xukan. 2019. Deep learning with edge computing: A review. Proceedings of IEEE 107, 8 (2019), 16551674. Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Chen Tianqi and Guestrin Carlos. 2016. XGBoost: A scalable tree boosting system(KDD’16). ACM, New York, NY, 785—794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Chen Yunpeng, Li Jianan, Xiao Huaxin, Jin Xiaojie, Yan Shuicheng, and Feng Jiashi. 2017. Dual path networks. In Advances in Neural Information Processing Systems, Guyon I., Luxburg U. V., Bengio S., Wallach H., Fergus R., Vishwanathan S., and Garnett R. (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/f7e0b956540676a129760a3eae309294-Paper.pdf.Google ScholarGoogle Scholar
  15. [15] Chetlur Sharan, Woolley Cliff, Vandermersch Philippe, Cohen Jonathan, Tran John, Catanzaro Bryan, and Shelhamer Evan. 2014. CuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).Google ScholarGoogle Scholar
  16. [16] Myttenaere Arnaud de, Golden Boris, Grand Bénédicte Le, and Rossi Fabrice. 2016. Mean absolute percentage error for regression models. Neurocomputing 192 (2016), 3848. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. 248255. Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Deng Lei, Li Guoqi, Han Song, Shi Luping, and Xie Yuan. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc IEEE 108, 4 (2020), 485532. Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Gholami Amir, Kwon Kiseok, Wu Bichen, Tai Zizheng, Yue Xiangyu, Jin Peter, Zhao Sicheng, and Keutzer Kurt. 2018. Squeezenext: Hardware-aware neural network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, Utah, USA. 16381647.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Identity mappings in deep residual networks. In European Conference on Computer Vision, Las Vegas, Nevada, USA. Springer, 630645.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Hearst M. A., Dumais S. T., Osuna E., Platt J., and Scholkopf B.. 1998. Support vector machines. IEEE Intelligent Systems and their Applications 13, 4 (1998), 1828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Hoerl Arthur E. and Kennard Robert W.. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1 (1970), 5567. Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Hosseini Morteza, Ebrahimabadi Mohammad, Mazumder Arnab Neelim, Homayoun Houman, and Mohsenin Tinoosh. 2021. A fast method to fine-tune neural networks for the least energy consumption on FPGAs. UMBC Student Collection (2021).Google ScholarGoogle Scholar
  25. [25] Howard Andrew G., Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, and Adam Hartwig. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  26. [26] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Huang Gao, Liu Zhuang, Maaten Laurens Van Der, and Weinberger Kilian Q.. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA. 47004708.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Ignatov Andrey, Timofte Radu, Chou William, Wang Ke, Wu Max, Hartley Tim, and Gool Luc Van. 2018. AI benchmark: Running deep neural networks on Android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV’18) Workshops, Munich, Germany. 1–8.Google ScholarGoogle Scholar
  29. [29] Janai Joel, Güney Fatma, Behl Aseem, Geiger Andreas, et al. 2020. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Foundations and Trends® in Computer Graphics and Vision 12, 1–3 (2020), 1308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Jo Jongmin, Jeong Sucheol, and Kang Pilsung. 2020. Benchmarking GPU-accelerated edge devices. In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp’20), Busan, Korea South. 117120. Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Justus Daniel, Brennan John, Bonner Stephen, and McGough Andrew Stephen. 2018. Predicting the computational cost of deep learning models. In 2018 IEEE International Conference on Big Data (Big Data’18), Seattle, WA, USA. 38733882. Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Khan Asifullah, Sohail Anabia, Zahoora Umme, and Qureshi Aqsa Saeed. 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53, 8 (2020), 54555516.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Kiani Mohsen and Rajabzadeh Amir. 2021. SDAM: A combined stack distance-analytical modeling approach to estimate memory performance in GPUs. The Journal of Supercomputing 77, 5 (2021), 51205147.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Pereira F., Burges C. J. C., Bottou L., and Weinberger K. Q. (Eds.), Vol. 25. Curran Associates, Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Lee Yen-Lin, Tsung Pei-Kuei, and Wu Max. 2018. Techology trend of edge AI. In 2018 International Symposium on VLSI Design, Automation and Test (VLSI-DAT’18), Hsinchu, Taiwan. 12. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Li Cheng, Dakkak Abdul, Xiong Jinjun, Wei Wei, Xu Lingjie, and Hwu Wen-mei. 2020. XSP: Across-stack profiling and analysis of machine learning models on GPUs. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS’20), New Orleans, LA, USA. 326327. Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Liaw Andy and Wiener. Matthew2002. Classification and regression by randomForest. R News 2, 3 (2002), 1822.Google ScholarGoogle Scholar
  38. [38] Lin Mingbao, Ji Rongrong, Wang Yan, Zhang Yichen, Zhang Baochang, Tian Yonghong, and Shao Ling. 2020. HRank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual. 15291538.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Lin Mingbao, Ji Rongrong, Zhang Yuxin, Zhang Baochang, Wu Yongjian, and Tian Yonghong. 2020. Channel pruning via automatic structure search. arXiv preprint arXiv:2001.08565 (2020).Google ScholarGoogle Scholar
  40. [40] Liu Peiye, Wu Bo, Ma Huadong, and Seok Mingoo. 2019. MemNet: Memory-efficiency guided neural architecture search with augment-trim learning. arXiv preprint arXiv:1907.09569 (2019).Google ScholarGoogle Scholar
  41. [41] Lu Zongqing, Rallapalli Swati, Chan Kevin, and Porta Thomas La. 2017. Modeling the resource requirements of convolutional neural networks on mobile devices. In Proceedings of the 25th ACM International Conference on Multimedia (Mountain View, CA) (MM’17). ACM, New York, NY, 1663—1671. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Ma Yufei, Cao Yu, Vrudhula Sarma, and Seo Jae-Sun. 2020. Performance modeling for CNN inference accelerators on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 4 (2020), 843856. Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Manasi Susmita Dey and Sapatnekar Sachin S.. 2021. DeepOpt: Optimized scheduling of CNN workloads for ASIC-based systolic deep learning accelerators. In Proceedings of the 26th Asia and South Pacific Design Automation Conference (Tokyo, Japan) (ASPDAC’21). ACM, New York, NY, 235—241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Mu Jiandong, Zhang Wei, Liang Hao, and Sinha Sharad. 2020. Optimizing OpenCL-Based CNN design on FPGA with comprehensive design space exploration and collaborative performance modeling. ACM Transactions on Reconfigurable Technology and Systems 13, 3, Article 13 (Jun 2020), 28 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Murtagh Fionn. 1991. Multilayer perceptrons for classification and regression. Neurocomputing 2, 5 (1991), 183197. Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Nogueira Paulo Eduardo, Matias Rivalino, and Vicente Elder. 2014. An experimental study on execution time variation in computer experiments. In Proceedings of the 29th Annual ACM Symposium on Applied Computing (Gyeongju, Republic of Korea) (SAC’14). ACM, New York, NY, 1529—1534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] NVIDIA. 2007. NVIDIA Profiler (nvprof). Retrieved June 30, 2020 from https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview.Google ScholarGoogle Scholar
  48. [48] NVIDIA. 2019. Tegrastats Utility. Retrieved December 01, 2020 from https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra.Google ScholarGoogle Scholar
  49. [49] Ostertagová Eva. 2012. Modelling using polynomial regression. Procedia Engineering 48 (2012), 500506. Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Qi Hang, Sparks Evan R., and Talwalkar Ameet. 2017. PALEO: A performance model for deep neural networks.Google ScholarGoogle Scholar
  51. [51] Qin Haotong, Gong Ruihao, Liu Xianglong, Bai Xiao, Song Jingkuan, and Sebe Nicu. 2020. Binary neural networks: A survey. Pattern Recognition 105 (2020), 107281. Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Rodrigues Crefeda Faviola, Riley Graham, and Luján Mikel. 2017. Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1. In 2017 IEEE International Symposium on Workload Characterization (IISWC’17). 114115. Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Rodrigues Crefeda Faviola, Riley Graham, and Luján Mikel. 2017. Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1. In 2017 IEEE International Symposium on Workload Characterization (IISWC’17). 114115. Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Rodriguez Juan D., Perez Aritz, and Lozano Jose A.. 2010. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 3 (2010), 569575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Shi Shaohuai, Wang Qiang, and Chu Xiaowen. 2018. Performance modeling and evaluation of distributed deep learning frameworks on GPUs. In IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech’18). Athens, Greece. 949957. Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Siu Kevin, Stuart Dylan Malone, Mahmoud Mostafa, and Moshovos Andreas. 2018. Memory requirements for convolutional neural network hardware accelerators. In 2018 IEEE International Symposium on Workload Characterization (IISWC’18). Raleigh, NC, USA. 111121. Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Stamoulis Dimitrios, Cai Ermao, Juan Da-Cheng, and Marculescu Diana. 2018. HyperPower: Power- and memory-constrained hyper-parameter optimization for neural networks. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE’18). Dresden, Germany. 1924. Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Sun Qi, Chen Tinghuan, Miao Jin, and Yu Bei. 2019. Power-driven DNN dataflow optimization on FPGA. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’19). Westminster, CO, USA. 17. Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, USA. 19.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Tan Mingxing, Chen Bo, Pang Ruoming, Vasudevan Vijay, Sandler Mark, Howard Andrew, and Le Quoc V.. 2019. MNASNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, California, USA. 28202828.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Tan Mingxing and Le Quoc V.. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019).Google ScholarGoogle Scholar
  62. [62] Vanholder Han. 2016. Efficient Inference with TensorRT. Retrieved June 30, 2020 from https://developer.nvidia.com/tensorrt.Google ScholarGoogle Scholar
  63. [63] Velasco-Montero Delia, Fernández-Berni Jorge, Carmona-Galán Ricardo, and Rodráguez-Vázquez Ángel. 2018. Optimum selection of DNN model and framework for edge inference. IEEE Access 6 (2018), 5168051692. Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Velasco-Montero Delia, Fernández-Berni Jorge, Carmona-Galán Ricardo, and Rodráguez-Vázquez Ángel. 2020. PreVIous: A methodology for prediction of visual inference performance on IoT devices. IEEE Internet of Things Journal 7, 10 (2020), 92279240. Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Wang Mengdi, Meng Chen, Long Guoping, Wu Chuan, Yang Jun, Lin Wei, and Jia Yangqing. 2019. Characterizing deep learning training workloads on Alibaba-PAI. In 2019 IEEE International Symposium on Workload Characterization (IISWC’19). Orlando, FL, USA. 189202. Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Wang Yu Emma, Wei Gu-Yeon, and Brooks David. 2019. Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).Google ScholarGoogle Scholar
  67. [67] Wu Bichen, Dai Xiaoliang, Zhang Peizhao, Wang Yanghan, Sun Fei, Wu Yiming, Tian Yuandong, Vajda Peter, Jia Yangqing, and Keutzer Kurt. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, California, USA. 1073410742.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Wu Bichen, Iandola Forrest, Jin Peter H., and Keutzer Kurt. 2017. SqueezeDet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, Hawaii, USA. 129137.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Zhang Huaizheng, Huang Yizheng, Wen Yonggang, Yin Jianxiong, and Guan Kyle. 2020. InferBench: Understanding deep learning inference serving with an automatic benchmarking system. arXiv preprint arXiv:2011.02327 (2020).Google ScholarGoogle Scholar
  70. [70] Zhang Xiaofan, Ye Hanchen, Wang Junsong, Lin Yonghua, Xiong Jinjun, Hwu Wen-mei, and Chen Deming. 2020. DNNExplorer: A framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator(ICCAD’20). ACM, New York, NY, Article 61, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Zhang Xiangyu, Zhou Xinyu, Lin Mengxiao, and Sun Jian. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA. 68486856.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Zoph Barret, Vasudevan Vijay, Shlens Jonathon, and Le Quoc V.. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA. IEEE, 86978710.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Zou Junhua, Rui Ting, Zhou You, Yang Chengsong, and Zhang Sai. 2018. Convolutional neural network simplification via feature map pruning. Computers & Electrical Engineering 70 (2018), 950958. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Performance Modeling of Computer Vision-based CNN on Edge GPUs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Embedded Computing Systems
            ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
            September 2022
            526 pages
            ISSN:1539-9087
            EISSN:1558-3465
            DOI:10.1145/3561947
            • Editor:
            • Tulika Mitra
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 October 2022
            • Online AM: 26 March 2022
            • Accepted: 14 March 2022
            • Revised: 13 February 2022
            • Received: 16 July 2021
            Published in tecs Volume 21, Issue 5

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)213
            • Downloads (Last 6 weeks)22

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format