ABSTRACT
With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference from the data samples collected at the EDs, we study the problem of offloading inference jobs by considering the following novel aspects: in contrast to a typical computational job 1) both inference accuracy and processing time of an inference job increase with the size of the ML model and 2)recently proposed Deep Neural Networks (DNNs) for resource-constrained EDs provide the choice of scaling down the model size by trading off the inference accuracy. Therefore, we consider that multiple small-size ML models are available at the ED and a powerful large-size ML model is available at the ES, and study a general assignment problem with the objective of maximizing the total inference accuracy for the data samples at the ED subject to a time constraint T on the makespan. Noting that the problem is NP-hard, we propose an approximation algorithm: Accuracy Maximization using LP-Relaxation and Rounding (AMR2), and prove that it results in a makespan at most 2T, and achieves a total accuracy that is lower by a small constant from the optimal total accuracy. As proof of concept, we implemented AMR2 on a Raspberry Pi, equipped with MobileNets, that is connected via LAN to a server equipped with ResNet, and studied the total accuracy and makespan performance of AMR2 for image classification.
- [n.d.]. Image Classification using TensorFlow Lite. https://www.tensorflow.org/lite/guide/hosted_models.Google Scholar
- [n.d.]. PyTorch Mobile. https://pytorch.org/mobile/home/.Google Scholar
- Jan van den Brand. 2019. A Deterministic Linear Program Solver in Current Matrix Multiplication Time. https://arxiv.org/abs/1910.11957Google Scholar
- Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Oncefor- All: Train One Network and Specialize it for Efficient Deployment. https://arxiv.org/abs/1908.09791Google Scholar
- Dirk G. Cattrysse and Luk N. Van Wassenhove. 1992. A survey of algorithms for the generalized assignment problem. European Journal of Operational Research 60, 3 (1992), 260--272.Google ScholarCross Ref
- Jaya Prakash Champati and Ben Liang. 2017. Semi-Online Algorithms for Computational Task Offloading with Communication Delay. IEEE Transactions on Parallel and Distributed Systems 28, 4 (2017), 1189--1201.Google ScholarDigital Library
- Jaya Prakash Champati and Ben Liang. 2020. Single Restart with Time Stamps for Parallel Task Processing with Known and Unknown Processors. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2020), 187--200.Google ScholarDigital Library
- Chandra Chekuri and Sanjeev Khanna. 2000. A PTAS for the Multiple Knapsack Problem. In Proc. ACM SODA. 213--222.Google Scholar
- Meng-Hsi Chen, Ben Liang, and Min Dong. 2015. A semidefinite relaxation approach to mobile cloud offloading with computing access point. In Proc. IEEE SPAWC. 186--190. https://doi.org/10.1109/SPAWC.2015.7227025Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. IEEE CVPR. 248--255. https://doi.org/10.1109/CVPR.2009.5206848Google ScholarCross Ref
- Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 108, 4 (2020), 485--532. https://doi.org/10.1109/JPROC.2020.2976475Google ScholarCross Ref
- Krzyszof Dudzinski. 1989. On a cardinality constrained linear programming knapsack problem. Operations Research Letters 8, 4 (1989), 215--218.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://arxiv.org/abs/1512.03385Google Scholar
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. https://arxiv.org/abs/1704.04861Google Scholar
- Mohamed Kamoun, Wael Labidi, and Mireille Sarkiss. 2015. Joint resource allocation and offloading strategies in cloud enabled cellular networks. In Proc. IEEE ICC. 5529--5534.Google ScholarCross Ref
- H. Kellerer, U. Pferschy, and D. Pisinger. 2004. Knapsack Problems. Springer, Berlin, Germany.Google Scholar
- Juan Liu, Yuyi Mao, Jun Zhang, and Khaled B. Letaief. 2016. Delay-optimal computation task scheduling for mobile-edge computing systems. In Proc. IEEE ISIT.Google Scholar
- Pavel Mach and Zdenek Becvar. 2017. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Communications Surveys Tutorials 19, 3 (2017), 1628--1656.Google ScholarDigital Library
- Yuyi Mao, Jun Zhang, and Khaled B. Letaief. 2016. Dynamic Computation Offloading for Mobile-Edge Computing With Energy Harvesting Devices. IEEE Journal on Selected Areas in Communications 34, 12 (2016), 3590--3605.Google ScholarDigital Library
- Ivana Nikoloska and Nikola Zlatanov. 2021. Data Selection Scheme for Energy Efficient Supervised Learning at IoT Nodes. IEEE Communications Letters 25, 3 (2021), 859--863.Google ScholarCross Ref
- Samuel S. Ogden and Tian Guo. 2020. MDInference: Balancing Inference Accuracy and Latency for Mobile Applications. https://arxiv.org/abs/2002.06603Google Scholar
- Michael L. Pinedo. 2008. Scheduling: Theory, Algorithms, and Systems (3rd ed.). Springer Publishing Company, Incorporated.Google ScholarDigital Library
- C.N. Potts. 1985. Analysis of a linear programming heuristic for scheduling unrelated parallel machines. Discrete Applied Mathematics 10, 2 (1985), 155--164.Google ScholarCross Ref
- G. Ross and R. Soland. 1975. A branch and bound algorithm for the generalized assignment problem. Mathematical Programming 8 (1975), 91--103.Google ScholarDigital Library
- Philipp Ross and Andre Luckow. 2019. EdgeInsight: Characterizing and Modeling the Performance of Machine Learning Inference on the Edge and Cloud. In 2019 IEEE International Conference on Big Data (Big Data). 1897--1906.Google Scholar
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. (2018). https://arxiv.org/abs/1801.04381Google Scholar
- Mahadev Satyanarayanan, Paramvir Bahl, Ramon Caceres, and Nigel Davies. 2009. The Case for VM-Based Cloudlets in Mobile Computing. IEEE Pervasive Computing 8, 4 (2009), 14--23.Google ScholarDigital Library
- Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge Computing: Vision and Challenges. IEEE Internet of Things Journal 3, 5 (2016), 637--646.Google ScholarCross Ref
- David B. Shmoys and Éva Tardos. 1993. An Approximation Algorithm for the Generalized Assignment Problem. Math. Program. 62, 1--3 (feb 1993), 461--474.Google ScholarDigital Library
- Sowndarya Sundar, Jaya Prakash Varma Champati, and Ben Liang. 2020. Multiuser Task Offloading to Heterogeneous Processors with Communication Delay and Budget Constraints. IEEE Transactions on Cloud Computing (2020), 1--1.Google Scholar
- Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks. https://arxiv.org/abs/1709.01686Google Scholar
- Yanting Wang, Min Sheng, Xijun Wang, Liang Wang, and Jiandong Li. 2016. Mobile-Edge Computing: Partial Computation Offloading Using Dynamic Voltage Scaling. IEEE Transactions on Communications 64, 10 (2016), 4268--4282.Google Scholar
- Zizhao Wang, Wei Bao, Dong Yuan, Liming Ge, Nguyen H. Tran, and Albert Y. Zomaya. 2019. SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage. In Proc. ACM MSWIM. 279--288.Google Scholar
Index Terms
- An Offloading Algorithm for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System
Recommendations
The Case for Hierarchical Deep Learning Inference at the Network Edge
NetAISys '23: Proceedings of the 1st International Workshop on Networked AI SystemsResource-constrained Edge Devices (EDs), e.g., IoT sensors and microcontroller units, are expected to make intelligent decisions using Deep Learning (DL) inference at the edge of the network. Toward this end, developing tinyML models is an area of ...
Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy
MECOMM'18: Proceedings of the 2018 Workshop on Mobile Edge CommunicationsAs the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance and energy ...
Offloading Algorithms for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System
With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning ...
Comments