skip to main content
10.1145/3447555.3465326acmotherconferencesArticle/Chapter ViewAbstractPublication Pagese-energyConference Proceedingsconference-collections
research-article
Public Access

Design Considerations for Energy-efficient Inference on Edge Devices

Published:22 June 2021Publication History

ABSTRACT

The emergence of low-power accelerators has enabled deep learning models to be executed on mobile or embedded edge devices without relying on cloud resources. The energy-constrained nature of these devices requires a judicious choice of a deep learning model and system configuration parameter to meet application needs while optimizing energy used during deep learning inference.

In this paper, we carry out an experimental evaluation of more than 40 popular pretrained deep learning models to characterize trends in their accuracy, latency, and energy when running on edge accelerators. Our results show that as models have grown in size, the marginal increase in their accuracy has come at a much higher energy cost. Consequently, simply choosing the most accurate model for an application task comes at a higher energy cost; the application designer needs to consider the tradeoff between latency, accuracy, and energy use to make an appropriate choice. Since the relation between these metrics is non-linear, we present a recommendation algorithm to enable application designers to choose the best deep learning model for an application that meets energy budget constraints. Our results show that our technique can provide recommendations that are within 3 to 7% of the specified budget while maximizing accuracy and minimizing energy.

References

  1. Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, and Martin Peres. 2014. Power and Performance Characterization and Modeling of GPU-Accelerated Systems. In 2014 IEEE 28th Int. Parallel Distrib. Process. Symp. 113--122.Google ScholarGoogle Scholar
  2. Apple. 2021. Apple Neural Engine. Retrieved January 15, 2021 from https://www.apple.com/newsroom/2020/11/apple-unleashes-m1/Google ScholarGoogle Scholar
  3. S. Bateni, H. Zhou, Y. Zhu, and C. Liu. 2018. PredJoule: A Timing-Predictable Energy Optimization Framework for Deep Neural Networks. In 2018 IEEE Real-Time Systems Symposium (RTSS). 107--118.Google ScholarGoogle Scholar
  4. J. Chen and X. Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (2019), 1655--1674.Google ScholarGoogle ScholarCross RefCross Ref
  5. Kaifei Chen, Tong Li, Hyung-Sin Kim, David E. Culler, and Randy H. Katz. 2018. MARVEL: Enabling Mobile Augmented Reality with Low Energy and Low Latency. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems (Shenzhen, China) (SenSys '18). Association for Computing Machinery, New York, NY, USA, 292--304.Google ScholarGoogle Scholar
  6. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  7. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  8. Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, and Kevin Murphy. 2017. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  9. W. Kang, D. Kim, and J. Park. 2019. DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices. IEEE Access 7 (2019), 168048--168059.Google ScholarGoogle ScholarCross RefCross Ref
  10. Qianlin Liang, Prashant J. Shenoy, and David E. Irwin. 2020. AI on the Edge: Characterizing AI-based IoT Applications Using Specialized Edge Architectures. In IISWC. 145--156.Google ScholarGoogle Scholar
  11. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. Lecture Notes in Computer Science (2014), 740--755.Google ScholarGoogle ScholarCross RefCross Ref
  12. B. Lu, J. Yang, L. Y. Chen, and S. Ren. 2019. Automating Deep Neural Network Model Selection for Edge Inference. In 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI). 184--193.Google ScholarGoogle Scholar
  13. Nvidia. 2020. NVIDIA Jetson Modules. Retrieved October 19, 2020 from https://developer.nvidia.com/embedded/jetson-modulesGoogle ScholarGoogle Scholar
  14. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024--8035.Google ScholarGoogle Scholar
  15. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Vol. 2016-Decem. 779--788.Google ScholarGoogle ScholarCross RefCross Ref
  16. Colin Samplawski, Jin Huang, Deepak Ganesan, and Benjamin M. Marlin. 2020. Towards Objection Detection Under IoT Resource Constraints: Combining Partitioning, Slicing and Compression. In Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (Virtual Event, Japan) (AIChallengeIoT '20). Association for Computing Machinery, New York, NY, USA, 14--20.Google ScholarGoogle Scholar
  17. Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM 63, 12 (Nov. 2020), 54--63.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]Google ScholarGoogle Scholar
  19. Christian Szegedy, Vanhoucke Vincent, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conf. Comput. Vis. Pattern Recognit., Vol. 2016-Decem. 2818--2826.Google ScholarGoogle Scholar
  20. M. Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv abs/1905.11946 (2019).Google ScholarGoogle Scholar
  21. Olivier Temam et al. 2019. Neural network accelerator with parameters resident on chip. US Patent 10,504,022.Google ScholarGoogle Scholar
  22. Chengcheng Wan, Muhammad Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, and Shan Lu. 2020. ALERT: Accurate Learning for Energy and Timeliness. In (USENIX ATC 20). 353--369.Google ScholarGoogle Scholar
  23. C. Zhang, M. Yu, w. wang, and F. Yan. 2020. Enabling Cost-Effective, SLO-Aware Machine Learning Inference Serving on Public Cloud. IEEE Transactions on Cloud Computing (2020), 1--1.Google ScholarGoogle Scholar

Index Terms

  1. Design Considerations for Energy-efficient Inference on Edge Devices

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        e-Energy '21: Proceedings of the Twelfth ACM International Conference on Future Energy Systems
        June 2021
        528 pages
        ISBN:9781450383332
        DOI:10.1145/3447555

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 June 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate160of446submissions,36%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader