Abstract
Convolutional Neural Networks (CNNs) are currently widely used in various fields, particularly for computer vision applications. Edge platforms have drawn tremendous attention from academia and industry due to their ability to improve execution time and preserve privacy. However, edge platforms struggle to satisfy CNNs’ needs due to their computation and energy constraints. Thus, it is challenging to find the most efficient CNN that respects accuracy, time, energy, and memory footprint constraints for a target edge platform. Furthermore, given the size of the design space of CNNs and hardware platforms, performance evaluation of CNNs entails several efforts. Consequently, designers need tools to quickly explore large design space and select the CNN that offers the best performance trade-off for a set of hardware platforms. This article proposes a Machine Learning (ML)–based modeling approach for CNN performances on edge GPU-based platforms for vision applications. We implement and compare five of the most successful ML algorithms for accurate and rapid CNN performance predictions on three different edge GPUs in image classification. Experimental results demonstrate the robustness and usefulness of our proposed methodology. For three of the five ML algorithms — XGBoost, Random Forest, and Ridge Polynomial regression — average errors of 11%, 6%, and 8% have been obtained for CNN inference execution time, power consumption, and memory usage, respectively.
- [1] . 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Savannah, GA, 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.Google Scholar
- [2] . 2020. Best of both worlds: AutoML codesign of a CNN and its hardware accelerator. In 57th ACM/IEEE Design Automation Conference (DAC’20). 1–6. Google ScholarCross Ref
- [3] . 2007. The Kendall rank correlation coefficient. Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, CA (2007), 508–510.Google Scholar
- [4] . 2016. A comparison of GPU execution time prediction using machine learning and analytical modeling. In IEEE 15th International Symposium on Network Computing and Applications (NCA’16). 326–333. Google ScholarCross Ref
- [5] . 2020. Fast, Accurate, and Scalable Memory Modeling of GPGPUs Using Reuse Profiles. Association for Computing Machinery, New York, NY. Google ScholarDigital Library
- [6] . 2015. Support vector regression. In Efficient Learning Machines. Springer, 67–80.Google ScholarCross Ref
- [7] . 2009. Pearson Correlation Coefficient. Springer, Berlin, Heidelberg,1–4. Google ScholarCross Ref
- [8] . 2021. A comprehensive survey on hardware-aware neural architecture search. arXiv:2101.09336 [cs.LG].Google Scholar
- [9] . 2018. Benchmark analysis of representative deep neural network architectures. IEEE Access 6 (2018), 64270–64277. Google ScholarCross Ref
- [10] . 2017. NeuralPower: Predict and deploy energy-efficient convolutional neural networks. In Proceedings of the 9th Asian Conference on Machine Learning(
Proceedings of Machine Learning Research , Vol. 77), and (Eds.). PMLR, Yonsei University, Seoul, Republic of Korea, 622–637. https://proceedings.mlr.press/v77/cai17a.html.Google Scholar - [11] . 2020. TinyTL: Reduce memory, not parameters for efficient on-device learning. In Advances in Neural Information Processing Systems, , , , , and (Eds.), Vol. 33. Curran Associates, Inc., 11285–11297. https://proceedings.neurips.cc/paper/2020/file/81f7acabd411274fcf65ce2070ed568a-Paper.pdf.Google Scholar
- [12] . 2019. Deep learning with edge computing: A review. Proceedings of IEEE 107, 8 (2019), 1655–1674. Google ScholarCross Ref
- [13] . 2016. XGBoost: A scalable tree boosting system(
KDD’16 ). ACM, New York, NY, 785—794. Google ScholarDigital Library - [14] . 2017. Dual path networks. In Advances in Neural Information Processing Systems, , , , , , , and (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/f7e0b956540676a129760a3eae309294-Paper.pdf.Google Scholar
- [15] . 2014. CuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).Google Scholar
- [16] . 2016. Mean absolute percentage error for regression models. Neurocomputing 192 (2016), 38–48. Google ScholarCross Ref
- [17] . 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. 248–255. Google ScholarCross Ref
- [18] . 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc IEEE 108, 4 (2020), 485–532. Google ScholarCross Ref
- [19] . 2018. Squeezenext: Hardware-aware neural network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, Utah, USA. 1638–1647.Google ScholarCross Ref
- [20] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA. 770–778.Google ScholarCross Ref
- [21] . 2016. Identity mappings in deep residual networks. In European Conference on Computer Vision, Las Vegas, Nevada, USA. Springer, 630–645.Google ScholarCross Ref
- [22] . 1998. Support vector machines. IEEE Intelligent Systems and their Applications 13, 4 (1998), 18–28. Google ScholarDigital Library
- [23] . 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1 (1970), 55–67. Google ScholarCross Ref
- [24] . 2021. A fast method to fine-tune neural networks for the least energy consumption on FPGAs. UMBC Student Collection (2021).Google Scholar
- [25] . 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- [26] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google ScholarCross Ref
- [27] . 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA. 4700–4708.Google ScholarCross Ref
- [28] . 2018. AI benchmark: Running deep neural networks on Android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV’18) Workshops, Munich, Germany. 1–8.Google Scholar
- [29] . 2020. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Foundations and Trends® in Computer Graphics and Vision 12, 1–3 (2020), 1–308.Google ScholarDigital Library
- [30] . 2020. Benchmarking GPU-accelerated edge devices. In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp’20), Busan, Korea South. 117–120. Google ScholarCross Ref
- [31] . 2018. Predicting the computational cost of deep learning models. In 2018 IEEE International Conference on Big Data (Big Data’18), Seattle, WA, USA. 3873–3882. Google ScholarCross Ref
- [32] . 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53, 8 (2020), 5455–5516.Google ScholarDigital Library
- [33] . 2021. SDAM: A combined stack distance-analytical modeling approach to estimate memory performance in GPUs. The Journal of Supercomputing 77, 5 (2021), 5120–5147.Google ScholarCross Ref
- [34] . 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, , , , and (Eds.), Vol. 25. Curran Associates, Inc.Google ScholarDigital Library
- [35] . 2018. Techology trend of edge AI. In 2018 International Symposium on VLSI Design, Automation and Test (VLSI-DAT’18), Hsinchu, Taiwan. 1–2. Google ScholarCross Ref
- [36] . 2020. XSP: Across-stack profiling and analysis of machine learning models on GPUs. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS’20), New Orleans, LA, USA. 326–327. Google ScholarCross Ref
- [37] 2002. Classification and regression by randomForest. R News 2, 3 (2002), 18–22.Google Scholar
- [38] . 2020. HRank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual. 1529–1538.Google ScholarCross Ref
- [39] . 2020. Channel pruning via automatic structure search. arXiv preprint arXiv:2001.08565 (2020).Google Scholar
- [40] . 2019. MemNet: Memory-efficiency guided neural architecture search with augment-trim learning. arXiv preprint arXiv:1907.09569 (2019).Google Scholar
- [41] . 2017. Modeling the resource requirements of convolutional neural networks on mobile devices. In Proceedings of the 25th ACM International Conference on Multimedia (Mountain View, CA) (
MM’17 ). ACM, New York, NY, 1663—1671. Google ScholarDigital Library - [42] . 2020. Performance modeling for CNN inference accelerators on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 4 (2020), 843–856. Google ScholarCross Ref
- [43] . 2021. DeepOpt: Optimized scheduling of CNN workloads for ASIC-based systolic deep learning accelerators. In Proceedings of the 26th Asia and South Pacific Design Automation Conference (Tokyo, Japan) (
ASPDAC’21 ). ACM, New York, NY, 235—241. Google ScholarDigital Library - [44] . 2020. Optimizing OpenCL-Based CNN design on FPGA with comprehensive design space exploration and collaborative performance modeling. ACM Transactions on Reconfigurable Technology and Systems 13, 3, Article 13 (
Jun 2020), 28 pages. Google ScholarDigital Library - [45] . 1991. Multilayer perceptrons for classification and regression. Neurocomputing 2, 5 (1991), 183–197. Google ScholarCross Ref
- [46] . 2014. An experimental study on execution time variation in computer experiments. In Proceedings of the 29th Annual ACM Symposium on Applied Computing (Gyeongju, Republic of Korea) (
SAC’14 ). ACM, New York, NY, 1529—1534. Google ScholarDigital Library - [47] . 2007. NVIDIA Profiler (nvprof). Retrieved June 30, 2020 from https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview.Google Scholar
- [48] . 2019. Tegrastats Utility. Retrieved December 01, 2020 from https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra.Google Scholar
- [49] . 2012. Modelling using polynomial regression. Procedia Engineering 48 (2012), 500–506. Google ScholarCross Ref
- [50] . 2017. PALEO: A performance model for deep neural networks.Google Scholar
- [51] . 2020. Binary neural networks: A survey. Pattern Recognition 105 (2020), 107281. Google ScholarCross Ref
- [52] . 2017. Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1. In 2017 IEEE International Symposium on Workload Characterization (IISWC’17). 114–115. Google ScholarCross Ref
- [53] . 2017. Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1. In 2017 IEEE International Symposium on Workload Characterization (IISWC’17). 114–115. Google ScholarCross Ref
- [54] . 2010. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 3 (2010), 569–575. Google ScholarDigital Library
- [55] . 2018. Performance modeling and evaluation of distributed deep learning frameworks on GPUs. In IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech’18). Athens, Greece. 949–957. Google ScholarCross Ref
- [56] . 2018. Memory requirements for convolutional neural network hardware accelerators. In 2018 IEEE International Symposium on Workload Characterization (IISWC’18). Raleigh, NC, USA. 111–121. Google ScholarCross Ref
- [57] . 2018. HyperPower: Power- and memory-constrained hyper-parameter optimization for neural networks. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE’18). Dresden, Germany. 19–24. Google ScholarCross Ref
- [58] . 2019. Power-driven DNN dataflow optimization on FPGA. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’19). Westminster, CO, USA. 1–7. Google ScholarCross Ref
- [59] . 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, USA. 1–9.Google ScholarCross Ref
- [60] . 2019. MNASNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, California, USA. 2820–2828.Google ScholarCross Ref
- [61] . 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019).Google Scholar
- [62] . 2016. Efficient Inference with TensorRT. Retrieved June 30, 2020 from https://developer.nvidia.com/tensorrt.Google Scholar
- [63] . 2018. Optimum selection of DNN model and framework for edge inference. IEEE Access 6 (2018), 51680–51692. Google ScholarCross Ref
- [64] . 2020. PreVIous: A methodology for prediction of visual inference performance on IoT devices. IEEE Internet of Things Journal 7, 10 (2020), 9227–9240. Google ScholarCross Ref
- [65] . 2019. Characterizing deep learning training workloads on Alibaba-PAI. In 2019 IEEE International Symposium on Workload Characterization (IISWC’19). Orlando, FL, USA. 189–202. Google ScholarCross Ref
- [66] . 2019. Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).Google Scholar
- [67] . 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, California, USA. 10734–10742.Google ScholarCross Ref
- [68] . 2017. SqueezeDet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, Hawaii, USA. 129–137.Google ScholarCross Ref
- [69] . 2020. InferBench: Understanding deep learning inference serving with an automatic benchmarking system. arXiv preprint arXiv:2011.02327 (2020).Google Scholar
- [70] . 2020. DNNExplorer: A framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator(
ICCAD’20 ). ACM, New York, NY, Article 61, 9 pages. Google ScholarDigital Library - [71] . 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA. 6848–6856.Google ScholarCross Ref
- [72] . 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA. IEEE, 8697–8710.Google ScholarCross Ref
- [73] . 2018. Convolutional neural network simplification via feature map pruning. Computers & Electrical Engineering 70 (2018), 950–958. Google ScholarCross Ref
Index Terms
- Performance Modeling of Computer Vision-based CNN on Edge GPUs
Recommendations
Performance prediction for convolutional neural networks on edge GPUs
CF '21: Proceedings of the 18th ACM International Conference on Computing FrontiersEdge computing is increasingly used for Artificial Intelligence (AI) purposes to meet latency, privacy, and energy challenges. Convolutional Neural networks (CNN) are more frequently deployed on Edge devices for several applications. However, due to ...
Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesThe Convolutional Neural Network (CNN) kernel is a fundamental building block for deep learning, which dominates the computational cost of deep learning pipelines for image analysis. The synthesis of high-performance GPU kernels for CNNs is thus of ...
CNN-based language and interpreter for image processing on GPUs
The inherent massive parallelism of cellular neural networks (CNNs) makes them an ideal computational platform for kernel-based algorithms and image processing. General-purpose graphics processing units (GPUs) provide similar massive parallelism, but it ...
Comments