research-article

An Offloading Algorithm for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System

Authors:
Andrea Fresa

IMDEA Networks Institute & University Carlos III of Madrid, Madrid, Spain

IMDEA Networks Institute & University Carlos III of Madrid, Madrid, Spain
View Profile

,
Jaya Prakash Varma Champati

IMDEA Networks Institute, Madrid, Spain

IMDEA Networks Institute, Madrid, Spain
View Profile

MSWiM '22: Proceedings of the 25th International ACM Conference on Modeling Analysis and Simulation of Wireless and Mobile SystemsOctober 2022Pages 15–23https://doi.org/10.1145/3551659.3559044

Published:24 October 2022Publication History

MSWiM '22: Proceedings of the 25th International ACM Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems

Pages 15–23

ABSTRACT

With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference from the data samples collected at the EDs, we study the problem of offloading inference jobs by considering the following novel aspects: in contrast to a typical computational job 1) both inference accuracy and processing time of an inference job increase with the size of the ML model and 2)recently proposed Deep Neural Networks (DNNs) for resource-constrained EDs provide the choice of scaling down the model size by trading off the inference accuracy. Therefore, we consider that multiple small-size ML models are available at the ED and a powerful large-size ML model is available at the ES, and study a general assignment problem with the objective of maximizing the total inference accuracy for the data samples at the ED subject to a time constraint T on the makespan. Noting that the problem is NP-hard, we propose an approximation algorithm: Accuracy Maximization using LP-Relaxation and Rounding (AMR²), and prove that it results in a makespan at most 2T, and achieves a total accuracy that is lower by a small constant from the optimal total accuracy. As proof of concept, we implemented AMR² on a Raspberry Pi, equipped with MobileNets, that is connected via LAN to a server equipped with ResNet, and studied the total accuracy and makespan performance of AMR2 for image classification.

References

[n.d.]. Image Classification using TensorFlow Lite. https://www.tensorflow.org/lite/guide/hosted_models.Google Scholar
[n.d.]. PyTorch Mobile. https://pytorch.org/mobile/home/.Google Scholar
Jan van den Brand. 2019. A Deterministic Linear Program Solver in Current Matrix Multiplication Time. https://arxiv.org/abs/1910.11957Google Scholar
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Oncefor- All: Train One Network and Specialize it for Efficient Deployment. https://arxiv.org/abs/1908.09791Google Scholar
Dirk G. Cattrysse and Luk N. Van Wassenhove. 1992. A survey of algorithms for the generalized assignment problem. European Journal of Operational Research 60, 3 (1992), 260--272.Google ScholarCross Ref
Jaya Prakash Champati and Ben Liang. 2017. Semi-Online Algorithms for Computational Task Offloading with Communication Delay. IEEE Transactions on Parallel and Distributed Systems 28, 4 (2017), 1189--1201.Google ScholarDigital Library
Jaya Prakash Champati and Ben Liang. 2020. Single Restart with Time Stamps for Parallel Task Processing with Known and Unknown Processors. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2020), 187--200.Google ScholarDigital Library
Chandra Chekuri and Sanjeev Khanna. 2000. A PTAS for the Multiple Knapsack Problem. In Proc. ACM SODA. 213--222.Google Scholar
Meng-Hsi Chen, Ben Liang, and Min Dong. 2015. A semidefinite relaxation approach to mobile cloud offloading with computing access point. In Proc. IEEE SPAWC. 186--190. https://doi.org/10.1109/SPAWC.2015.7227025Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. IEEE CVPR. 248--255. https://doi.org/10.1109/CVPR.2009.5206848Google ScholarCross Ref
Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 108, 4 (2020), 485--532. https://doi.org/10.1109/JPROC.2020.2976475Google ScholarCross Ref
Krzyszof Dudzinski. 1989. On a cardinality constrained linear programming knapsack problem. Operations Research Letters 8, 4 (1989), 215--218.Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://arxiv.org/abs/1512.03385Google Scholar
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. https://arxiv.org/abs/1704.04861Google Scholar
Mohamed Kamoun, Wael Labidi, and Mireille Sarkiss. 2015. Joint resource allocation and offloading strategies in cloud enabled cellular networks. In Proc. IEEE ICC. 5529--5534.Google ScholarCross Ref
H. Kellerer, U. Pferschy, and D. Pisinger. 2004. Knapsack Problems. Springer, Berlin, Germany.Google Scholar
Juan Liu, Yuyi Mao, Jun Zhang, and Khaled B. Letaief. 2016. Delay-optimal computation task scheduling for mobile-edge computing systems. In Proc. IEEE ISIT.Google Scholar
Pavel Mach and Zdenek Becvar. 2017. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Communications Surveys Tutorials 19, 3 (2017), 1628--1656.Google ScholarDigital Library
Yuyi Mao, Jun Zhang, and Khaled B. Letaief. 2016. Dynamic Computation Offloading for Mobile-Edge Computing With Energy Harvesting Devices. IEEE Journal on Selected Areas in Communications 34, 12 (2016), 3590--3605.Google ScholarDigital Library
Ivana Nikoloska and Nikola Zlatanov. 2021. Data Selection Scheme for Energy Efficient Supervised Learning at IoT Nodes. IEEE Communications Letters 25, 3 (2021), 859--863.Google ScholarCross Ref
Samuel S. Ogden and Tian Guo. 2020. MDInference: Balancing Inference Accuracy and Latency for Mobile Applications. https://arxiv.org/abs/2002.06603Google Scholar
Michael L. Pinedo. 2008. Scheduling: Theory, Algorithms, and Systems (3rd ed.). Springer Publishing Company, Incorporated.Google ScholarDigital Library
C.N. Potts. 1985. Analysis of a linear programming heuristic for scheduling unrelated parallel machines. Discrete Applied Mathematics 10, 2 (1985), 155--164.Google ScholarCross Ref
G. Ross and R. Soland. 1975. A branch and bound algorithm for the generalized assignment problem. Mathematical Programming 8 (1975), 91--103.Google ScholarDigital Library
Philipp Ross and Andre Luckow. 2019. EdgeInsight: Characterizing and Modeling the Performance of Machine Learning Inference on the Edge and Cloud. In 2019 IEEE International Conference on Big Data (Big Data). 1897--1906.Google Scholar
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. (2018). https://arxiv.org/abs/1801.04381Google Scholar
Mahadev Satyanarayanan, Paramvir Bahl, Ramon Caceres, and Nigel Davies. 2009. The Case for VM-Based Cloudlets in Mobile Computing. IEEE Pervasive Computing 8, 4 (2009), 14--23.Google ScholarDigital Library
Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge Computing: Vision and Challenges. IEEE Internet of Things Journal 3, 5 (2016), 637--646.Google ScholarCross Ref
David B. Shmoys and Éva Tardos. 1993. An Approximation Algorithm for the Generalized Assignment Problem. Math. Program. 62, 1--3 (feb 1993), 461--474.Google ScholarDigital Library
Sowndarya Sundar, Jaya Prakash Varma Champati, and Ben Liang. 2020. Multiuser Task Offloading to Heterogeneous Processors with Communication Delay and Budget Constraints. IEEE Transactions on Cloud Computing (2020), 1--1.Google Scholar
Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks. https://arxiv.org/abs/1709.01686Google Scholar
Yanting Wang, Min Sheng, Xijun Wang, Liang Wang, and Jiandong Li. 2016. Mobile-Edge Computing: Partial Computation Offloading Using Dynamic Voltage Scaling. IEEE Transactions on Communications 64, 10 (2016), 4268--4282.Google Scholar
Zizhao Wang, Wei Bao, Dong Yuan, Liming Ge, Nguyen H. Tran, and Albert Y. Zomaya. 2019. SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage. In Proc. ACM MSWIM. 279--288.Google Scholar

Index Terms

An Offloading Algorithm for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System
1. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
      1. Scheduling algorithms

Recommendations

The Case for Hierarchical Deep Learning Inference at the Network Edge
NetAISys '23: Proceedings of the 1st International Workshop on Networked AI Systems

Resource-constrained Edge Devices (EDs), e.g., IoT sensors and microcontroller units, are expected to make intelligent decisions using Deep Learning (DL) inference at the edge of the network. Toward this end, developing tinyML models is an area of ...
Read More
Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy
MECOMM'18: Proceedings of the 2018 Workshop on Mobile Edge Communications

As the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance and energy ...
Read More
Offloading Algorithms for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System
With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MSWiM '22: Proceedings of the 25th International ACM Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems
October 2022
243 pages
ISBN:9781450394826
DOI:10.1145/3551659
General Chair:
Azzedine Boukerche
University of Ottawa, Canada
,
Program Chairs:
Carlo Giannelli
University of Ferrara, Italy
,
Zhi Sun
Tsinghua University, China
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
approximation ratio algorithm
edge computing
machine learning inference
Qualifiers
- research-article
Conference

Acceptance Rates
MSWiM '22 Paper Acceptance Rate27of117submissions,23%Overall Acceptance Rate398of1,577submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 118
  Total Downloads
- Downloads (Last 12 months)59
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An Offloading Algorithm for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System

MSWiM '22: Proceedings of the 25th International ACM Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Case for Hierarchical Deep Learning Inference at the Network Edge

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

Offloading Algorithms for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An Offloading Algorithm for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System

MSWiM '22: Proceedings of the 25th International ACM Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Case for Hierarchical Deep Learning Inference at the Network Edge

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

Offloading Algorithms for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media