ABSTRACT
The emergence of low-power accelerators has enabled deep learning models to be executed on mobile or embedded edge devices without relying on cloud resources. The energy-constrained nature of these devices requires a judicious choice of a deep learning model and system configuration parameter to meet application needs while optimizing energy used during deep learning inference.
In this paper, we carry out an experimental evaluation of more than 40 popular pretrained deep learning models to characterize trends in their accuracy, latency, and energy when running on edge accelerators. Our results show that as models have grown in size, the marginal increase in their accuracy has come at a much higher energy cost. Consequently, simply choosing the most accurate model for an application task comes at a higher energy cost; the application designer needs to consider the tradeoff between latency, accuracy, and energy use to make an appropriate choice. Since the relation between these metrics is non-linear, we present a recommendation algorithm to enable application designers to choose the best deep learning model for an application that meets energy budget constraints. Our results show that our technique can provide recommendations that are within 3 to 7% of the specified budget while maximizing accuracy and minimizing energy.
- Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, and Martin Peres. 2014. Power and Performance Characterization and Modeling of GPU-Accelerated Systems. In 2014 IEEE 28th Int. Parallel Distrib. Process. Symp. 113--122.Google Scholar
- Apple. 2021. Apple Neural Engine. Retrieved January 15, 2021 from https://www.apple.com/newsroom/2020/11/apple-unleashes-m1/Google Scholar
- S. Bateni, H. Zhou, Y. Zhu, and C. Liu. 2018. PredJoule: A Timing-Predictable Energy Optimization Framework for Deep Neural Networks. In 2018 IEEE Real-Time Systems Symposium (RTSS). 107--118.Google Scholar
- J. Chen and X. Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (2019), 1655--1674.Google ScholarCross Ref
- Kaifei Chen, Tong Li, Hyung-Sin Kim, David E. Culler, and Randy H. Katz. 2018. MARVEL: Enabling Mobile Augmented Reality with Low Energy and Low Latency. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems (Shenzhen, China) (SenSys '18). Association for Computing Machinery, New York, NY, USA, 292--304.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, and Kevin Murphy. 2017. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- W. Kang, D. Kim, and J. Park. 2019. DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices. IEEE Access 7 (2019), 168048--168059.Google ScholarCross Ref
- Qianlin Liang, Prashant J. Shenoy, and David E. Irwin. 2020. AI on the Edge: Characterizing AI-based IoT Applications Using Specialized Edge Architectures. In IISWC. 145--156.Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. Lecture Notes in Computer Science (2014), 740--755.Google ScholarCross Ref
- B. Lu, J. Yang, L. Y. Chen, and S. Ren. 2019. Automating Deep Neural Network Model Selection for Edge Inference. In 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI). 184--193.Google Scholar
- Nvidia. 2020. NVIDIA Jetson Modules. Retrieved October 19, 2020 from https://developer.nvidia.com/embedded/jetson-modulesGoogle Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024--8035.Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Vol. 2016-Decem. 779--788.Google ScholarCross Ref
- Colin Samplawski, Jin Huang, Deepak Ganesan, and Benjamin M. Marlin. 2020. Towards Objection Detection Under IoT Resource Constraints: Combining Partitioning, Slicing and Compression. In Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (Virtual Event, Japan) (AIChallengeIoT '20). Association for Computing Machinery, New York, NY, USA, 14--20.Google Scholar
- Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM 63, 12 (Nov. 2020), 54--63.Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]Google Scholar
- Christian Szegedy, Vanhoucke Vincent, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conf. Comput. Vis. Pattern Recognit., Vol. 2016-Decem. 2818--2826.Google Scholar
- M. Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv abs/1905.11946 (2019).Google Scholar
- Olivier Temam et al. 2019. Neural network accelerator with parameters resident on chip. US Patent 10,504,022.Google Scholar
- Chengcheng Wan, Muhammad Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, and Shan Lu. 2020. ALERT: Accurate Learning for Energy and Timeliness. In (USENIX ATC 20). 353--369.Google Scholar
- C. Zhang, M. Yu, w. wang, and F. Yan. 2020. Enabling Cost-Effective, SLO-Aware Machine Learning Inference Serving on Public Cloud. IEEE Transactions on Cloud Computing (2020), 1--1.Google Scholar
Index Terms
- Design Considerations for Energy-efficient Inference on Edge Devices
Recommendations
Energy efficient task allocation and energy scheduling in green energy powered edge computing
AbstractThe ever-increasing computation tasks and communication traffic have imposed a heavy burden on cloud data centers and also resulted in a significantly high energy consumption. To ease such burden, edge computing is proposed to explore ...
Highlights- This paper investigates the energy cost minimization problem in edge computing.
Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical Approach
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingSimilar to cloud servers which are well-known energy consumers, edge servers running 24/7 jointly consume a tremendous amount of energy and thus require energy-saving management. However, the unique characteristics of edge computing make it a new and ...
EdgeWise: Energy-efficient CNN Computation on Edge Devices under Stochastic Communication Delays
This article presents a framework to enable the energy-efficient execution of convolutional neural networks (CNNs) on edge devices. The framework consists of a pair of edge devices connected via a wireless network: a performance and energy-constrained ...
Comments