skip to main content
10.1145/3551659.3559044acmconferencesArticle/Chapter ViewAbstractPublication PagesmswimConference Proceedingsconference-collections
research-article

An Offloading Algorithm for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System

Published:24 October 2022Publication History

ABSTRACT

With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference from the data samples collected at the EDs, we study the problem of offloading inference jobs by considering the following novel aspects: in contrast to a typical computational job 1) both inference accuracy and processing time of an inference job increase with the size of the ML model and 2)recently proposed Deep Neural Networks (DNNs) for resource-constrained EDs provide the choice of scaling down the model size by trading off the inference accuracy. Therefore, we consider that multiple small-size ML models are available at the ED and a powerful large-size ML model is available at the ES, and study a general assignment problem with the objective of maximizing the total inference accuracy for the data samples at the ED subject to a time constraint T on the makespan. Noting that the problem is NP-hard, we propose an approximation algorithm: Accuracy Maximization using LP-Relaxation and Rounding (AMR2), and prove that it results in a makespan at most 2T, and achieves a total accuracy that is lower by a small constant from the optimal total accuracy. As proof of concept, we implemented AMR2 on a Raspberry Pi, equipped with MobileNets, that is connected via LAN to a server equipped with ResNet, and studied the total accuracy and makespan performance of AMR2 for image classification.

References

  1. [n.d.]. Image Classification using TensorFlow Lite. https://www.tensorflow.org/lite/guide/hosted_models.Google ScholarGoogle Scholar
  2. [n.d.]. PyTorch Mobile. https://pytorch.org/mobile/home/.Google ScholarGoogle Scholar
  3. Jan van den Brand. 2019. A Deterministic Linear Program Solver in Current Matrix Multiplication Time. https://arxiv.org/abs/1910.11957Google ScholarGoogle Scholar
  4. Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Oncefor- All: Train One Network and Specialize it for Efficient Deployment. https://arxiv.org/abs/1908.09791Google ScholarGoogle Scholar
  5. Dirk G. Cattrysse and Luk N. Van Wassenhove. 1992. A survey of algorithms for the generalized assignment problem. European Journal of Operational Research 60, 3 (1992), 260--272.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jaya Prakash Champati and Ben Liang. 2017. Semi-Online Algorithms for Computational Task Offloading with Communication Delay. IEEE Transactions on Parallel and Distributed Systems 28, 4 (2017), 1189--1201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jaya Prakash Champati and Ben Liang. 2020. Single Restart with Time Stamps for Parallel Task Processing with Known and Unknown Processors. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2020), 187--200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chandra Chekuri and Sanjeev Khanna. 2000. A PTAS for the Multiple Knapsack Problem. In Proc. ACM SODA. 213--222.Google ScholarGoogle Scholar
  9. Meng-Hsi Chen, Ben Liang, and Min Dong. 2015. A semidefinite relaxation approach to mobile cloud offloading with computing access point. In Proc. IEEE SPAWC. 186--190. https://doi.org/10.1109/SPAWC.2015.7227025Google ScholarGoogle ScholarCross RefCross Ref
  10. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. IEEE CVPR. 248--255. https://doi.org/10.1109/CVPR.2009.5206848Google ScholarGoogle ScholarCross RefCross Ref
  11. Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 108, 4 (2020), 485--532. https://doi.org/10.1109/JPROC.2020.2976475Google ScholarGoogle ScholarCross RefCross Ref
  12. Krzyszof Dudzinski. 1989. On a cardinality constrained linear programming knapsack problem. Operations Research Letters 8, 4 (1989), 215--218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://arxiv.org/abs/1512.03385Google ScholarGoogle Scholar
  14. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. https://arxiv.org/abs/1704.04861Google ScholarGoogle Scholar
  15. Mohamed Kamoun, Wael Labidi, and Mireille Sarkiss. 2015. Joint resource allocation and offloading strategies in cloud enabled cellular networks. In Proc. IEEE ICC. 5529--5534.Google ScholarGoogle ScholarCross RefCross Ref
  16. H. Kellerer, U. Pferschy, and D. Pisinger. 2004. Knapsack Problems. Springer, Berlin, Germany.Google ScholarGoogle Scholar
  17. Juan Liu, Yuyi Mao, Jun Zhang, and Khaled B. Letaief. 2016. Delay-optimal computation task scheduling for mobile-edge computing systems. In Proc. IEEE ISIT.Google ScholarGoogle Scholar
  18. Pavel Mach and Zdenek Becvar. 2017. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Communications Surveys Tutorials 19, 3 (2017), 1628--1656.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yuyi Mao, Jun Zhang, and Khaled B. Letaief. 2016. Dynamic Computation Offloading for Mobile-Edge Computing With Energy Harvesting Devices. IEEE Journal on Selected Areas in Communications 34, 12 (2016), 3590--3605.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ivana Nikoloska and Nikola Zlatanov. 2021. Data Selection Scheme for Energy Efficient Supervised Learning at IoT Nodes. IEEE Communications Letters 25, 3 (2021), 859--863.Google ScholarGoogle ScholarCross RefCross Ref
  21. Samuel S. Ogden and Tian Guo. 2020. MDInference: Balancing Inference Accuracy and Latency for Mobile Applications. https://arxiv.org/abs/2002.06603Google ScholarGoogle Scholar
  22. Michael L. Pinedo. 2008. Scheduling: Theory, Algorithms, and Systems (3rd ed.). Springer Publishing Company, Incorporated.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C.N. Potts. 1985. Analysis of a linear programming heuristic for scheduling unrelated parallel machines. Discrete Applied Mathematics 10, 2 (1985), 155--164.Google ScholarGoogle ScholarCross RefCross Ref
  24. G. Ross and R. Soland. 1975. A branch and bound algorithm for the generalized assignment problem. Mathematical Programming 8 (1975), 91--103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Philipp Ross and Andre Luckow. 2019. EdgeInsight: Characterizing and Modeling the Performance of Machine Learning Inference on the Edge and Cloud. In 2019 IEEE International Conference on Big Data (Big Data). 1897--1906.Google ScholarGoogle Scholar
  26. Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. (2018). https://arxiv.org/abs/1801.04381Google ScholarGoogle Scholar
  27. Mahadev Satyanarayanan, Paramvir Bahl, Ramon Caceres, and Nigel Davies. 2009. The Case for VM-Based Cloudlets in Mobile Computing. IEEE Pervasive Computing 8, 4 (2009), 14--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge Computing: Vision and Challenges. IEEE Internet of Things Journal 3, 5 (2016), 637--646.Google ScholarGoogle ScholarCross RefCross Ref
  29. David B. Shmoys and Éva Tardos. 1993. An Approximation Algorithm for the Generalized Assignment Problem. Math. Program. 62, 1--3 (feb 1993), 461--474.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sowndarya Sundar, Jaya Prakash Varma Champati, and Ben Liang. 2020. Multiuser Task Offloading to Heterogeneous Processors with Communication Delay and Budget Constraints. IEEE Transactions on Cloud Computing (2020), 1--1.Google ScholarGoogle Scholar
  31. Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks. https://arxiv.org/abs/1709.01686Google ScholarGoogle Scholar
  32. Yanting Wang, Min Sheng, Xijun Wang, Liang Wang, and Jiandong Li. 2016. Mobile-Edge Computing: Partial Computation Offloading Using Dynamic Voltage Scaling. IEEE Transactions on Communications 64, 10 (2016), 4268--4282.Google ScholarGoogle Scholar
  33. Zizhao Wang, Wei Bao, Dong Yuan, Liming Ge, Nguyen H. Tran, and Albert Y. Zomaya. 2019. SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage. In Proc. ACM MSWIM. 279--288.Google ScholarGoogle Scholar

Index Terms

  1. An Offloading Algorithm for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MSWiM '22: Proceedings of the 25th International ACM Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems
      October 2022
      243 pages
      ISBN:9781450394826
      DOI:10.1145/3551659

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 October 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      MSWiM '22 Paper Acceptance Rate27of117submissions,23%Overall Acceptance Rate398of1,577submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader