research-article

Public Access

Design Considerations for Energy-efficient Inference on Edge Devices

Authors:
Walid A. Hanafy

University of Massachusetts Amherst

University of Massachusetts Amherst
View Profile

,
Tergel Molom-Ochir

University of Massachusetts Amherst

University of Massachusetts Amherst
View Profile

,
Rohan Shenoy

University of Massachusetts Amherst

University of Massachusetts Amherst
View Profile

e-Energy '21: Proceedings of the Twelfth ACM International Conference on Future Energy SystemsJune 2021Pages 302–308https://doi.org/10.1145/3447555.3465326

Published:22 June 2021Publication History

e-Energy '21: Proceedings of the Twelfth ACM International Conference on Future Energy Systems

Pages 302–308

ABSTRACT

The emergence of low-power accelerators has enabled deep learning models to be executed on mobile or embedded edge devices without relying on cloud resources. The energy-constrained nature of these devices requires a judicious choice of a deep learning model and system configuration parameter to meet application needs while optimizing energy used during deep learning inference.

In this paper, we carry out an experimental evaluation of more than 40 popular pretrained deep learning models to characterize trends in their accuracy, latency, and energy when running on edge accelerators. Our results show that as models have grown in size, the marginal increase in their accuracy has come at a much higher energy cost. Consequently, simply choosing the most accurate model for an application task comes at a higher energy cost; the application designer needs to consider the tradeoff between latency, accuracy, and energy use to make an appropriate choice. Since the relation between these metrics is non-linear, we present a recommendation algorithm to enable application designers to choose the best deep learning model for an application that meets energy budget constraints. Our results show that our technique can provide recommendations that are within 3 to 7% of the specified budget while maximizing accuracy and minimizing energy.

References

Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, and Martin Peres. 2014. Power and Performance Characterization and Modeling of GPU-Accelerated Systems. In 2014 IEEE 28th Int. Parallel Distrib. Process. Symp. 113--122.Google Scholar
Apple. 2021. Apple Neural Engine. Retrieved January 15, 2021 from https://www.apple.com/newsroom/2020/11/apple-unleashes-m1/Google Scholar
S. Bateni, H. Zhou, Y. Zhu, and C. Liu. 2018. PredJoule: A Timing-Predictable Energy Optimization Framework for Deep Neural Networks. In 2018 IEEE Real-Time Systems Symposium (RTSS). 107--118.Google Scholar
J. Chen and X. Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (2019), 1655--1674.Google ScholarCross Ref
Kaifei Chen, Tong Li, Hyung-Sin Kim, David E. Culler, and Randy H. Katz. 2018. MARVEL: Enabling Mobile Augmented Reality with Low Energy and Low Latency. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems (Shenzhen, China) (SenSys '18). Association for Computing Machinery, New York, NY, USA, 292--304.Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, and Kevin Murphy. 2017. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
W. Kang, D. Kim, and J. Park. 2019. DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices. IEEE Access 7 (2019), 168048--168059.Google ScholarCross Ref
Qianlin Liang, Prashant J. Shenoy, and David E. Irwin. 2020. AI on the Edge: Characterizing AI-based IoT Applications Using Specialized Edge Architectures. In IISWC. 145--156.Google Scholar
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. Lecture Notes in Computer Science (2014), 740--755.Google ScholarCross Ref
B. Lu, J. Yang, L. Y. Chen, and S. Ren. 2019. Automating Deep Neural Network Model Selection for Edge Inference. In 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI). 184--193.Google Scholar
Nvidia. 2020. NVIDIA Jetson Modules. Retrieved October 19, 2020 from https://developer.nvidia.com/embedded/jetson-modulesGoogle Scholar
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024--8035.Google Scholar
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Vol. 2016-Decem. 779--788.Google ScholarCross Ref
Colin Samplawski, Jin Huang, Deepak Ganesan, and Benjamin M. Marlin. 2020. Towards Objection Detection Under IoT Resource Constraints: Combining Partitioning, Slicing and Compression. In Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (Virtual Event, Japan) (AIChallengeIoT '20). Association for Computing Machinery, New York, NY, USA, 14--20.Google Scholar
Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM 63, 12 (Nov. 2020), 54--63.Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]Google Scholar
Christian Szegedy, Vanhoucke Vincent, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conf. Comput. Vis. Pattern Recognit., Vol. 2016-Decem. 2818--2826.Google Scholar
M. Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv abs/1905.11946 (2019).Google Scholar
Olivier Temam et al. 2019. Neural network accelerator with parameters resident on chip. US Patent 10,504,022.Google Scholar
Chengcheng Wan, Muhammad Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, and Shan Lu. 2020. ALERT: Accurate Learning for Energy and Timeliness. In (USENIX ATC 20). 353--369.Google Scholar
C. Zhang, M. Yu, w. wang, and F. Yan. 2020. Enabling Cost-Effective, SLO-Aware Machine Learning Inference Serving on Public Cloud. IEEE Transactions on Cloud Computing (2020), 1--1.Google Scholar

Index Terms

Design Considerations for Energy-efficient Inference on Edge Devices
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Energy efficient task allocation and energy scheduling in green energy powered edge computing
Abstract
The ever-increasing computation tasks and communication traffic have imposed a heavy burden on cloud data centers and also resulted in a significantly high energy consumption. To ease such burden, edge computing is proposed to explore ...
Highlights
- This paper investigates the energy cost minimization problem in edge computing.
Read More
Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical Approach
ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Similar to cloud servers which are well-known energy consumers, edge servers running 24/7 jointly consume a tremendous amount of energy and thus require energy-saving management. However, the unique characteristics of edge computing make it a new and ...
Read More
EdgeWise: Energy-efficient CNN Computation on Edge Devices under Stochastic Communication Delays
This article presents a framework to enable the energy-efficient execution of convolutional neural networks (CNNs) on edge devices. The framework consists of a pair of edge devices connected via a wireless network: a performance and energy-constrained ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

e-Energy '21: Proceedings of the Twelfth ACM International Conference on Future Energy Systems
June 2021
528 pages
ISBN:9781450383332
DOI:10.1145/3447555

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning Inference
Edge Computing
Energy-efficient Deep Learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate160of446submissions,36%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 504
  Total Downloads
- Downloads (Last 12 months)204
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Design Considerations for Energy-efficient Inference on Edge Devices

e-Energy '21: Proceedings of the Twelfth ACM International Conference on Future Energy Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Energy efficient task allocation and energy scheduling in green energy powered edge computing

Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical Approach

EdgeWise: Energy-efficient CNN Computation on Edge Devices under Stochastic Communication Delays

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Design Considerations for Energy-efficient Inference on Edge Devices

e-Energy '21: Proceedings of the Twelfth ACM International Conference on Future Energy Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Energy efficient task allocation and energy scheduling in green energy powered edge computing

Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical Approach

EdgeWise: Energy-efficient CNN Computation on Edge Devices under Stochastic Communication Delays

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media