research-article

SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage

Authors:

Nguyen H. Tran,

Albert Y. ZomayaAuthors Info & Claims

MSWIM '19: Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems

Pages 279 - 288

https://doi.org/10.1145/3345768.3355917

Published: 25 November 2019 Publication History

Abstract

In recent years, the rapid development of edge computing enables us to process a wide variety of intelligent applications at the edge, such as real-time video analytics. However, edge computing could suffer from service outage caused by the fluctuated wireless connection or congested computing resource. During the service outage, the only choice is to process the deep neural network (DNN) inference at the local mobile devices. The obstacle is that due to the limited resource, it may not be possible to complete inference tasks on time. Inspired by the recently developedearly exit of DNNs, where we can exit DNN at earlier layers to shorten the inference delay by sacrificing an acceptable level of accuracy, we propose to adopt such mechanism to process inference tasks during the service outage. The challenge is how to obtain the optimal schedule with diverse early exit choices. To this end, we formulate an optimal scheduling problem with the objective to maximize a general overall utility. However, the problem is in the form of integer programming, which cannot be solved by a standard approach. We therefore prove the Ordered Scheduling structure, indicating that a frame arrived earlier must be scheduled earlier. Such structure greatly decreases the searching space for an optimal solution. Then, we propose the Scheduling Early Exit (SEE) algorithm based on dynamic programming, to solve the problem optimally with polynomial computational complexity. Finally, we conduct trace-driven simulations and compare SEE with two benchmarks. The result shows that SEE can outperform the benchmarks by 50.9%.

References

[1]

2017. LTE upload speed super slow everywhere. Retrieved May 31, 2019 from https://community.verizonwireless.com/t5/iPhone-X-Xr-Xs/LTE-uploadspeed- super-slow-everywhere/td-p/1026178

[2]

2019. Recommended upload encoding settings - YouTube Help. Retrieved May 31, 2019 from https://support.google.com/youtube/answer/1722171?hl=en

[3]

N. Abbas, Y. Zhang, A. Taherkordi, and T. Skeie. 2018. Mobile Edge Computing: A Survey. IEEE Internet of Things Journal 5, 1 (Feb. 2018), pp. 450--465.

[4]

J. Almeida, V. Almeida, D. Ardagna, Í. Cunha, C. Francalanci, and M. Trubian. 2010. Joint admission control and resource allocation in virtualized servers. J. Parallel and Distrib. Comput. 70, 4 (2010), 344 -- 362.

Digital Library

[5]

W. Bao, D. Yuan, Z. Yang, S. Wang, B. Zhou, S. Adams, and A. Zomaya. Oct. 2018. sFog: Seamless Fog Computing Environment for Mobile IoT Applications. In Proceedings of ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWIM). Montreal, Canada.

[6]

S. Bhattacharya and Nicholas D. Lane. Nov. 2016. Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables. In Proceedings of ACM Conference on Embedded Network Sensor Systems (SenSys). Stanford, CA, USA.

[7]

T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama. Aug. 2017. Adaptive Neural Networks for Efficient Inference. In Proceedings of International Conference on Machine Learning (ICML). Sydney, NSW, Australia.

[8]

J. Chen and X. Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (Aug 2019), 1655--1674.

[9]

M. Chen and Y. Hao. 2018. Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network. IEEE Journal on Selected Areas in Communications 36, 3 (Mar. 2018), pp. 587--597.

[10]

X. Chen, L. Jiao,W. Li, and X. Fu. 2016. Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing. IEEE/ACM Transactions on Networking 24, 5 (Oct. 2016), pp. 2795--2808.

Digital Library

[11]

Z. Fang, D. Hong, and R. K. Gupta. Jun. 2019. Serving Deep Neural Networks at the Cloud Edge for Vision Applications on Mobile Platforms (MMSys). Amherst, MA, USA.

[12]

K. Ha, Y. Abe, T. Eiszler, Z. Chen, W. Hu, B. Amos, R. Upadhyaya, P. Pillai, and M. Satyanarayanan. April. 2017. You Can Teach Elephants to Dance: Agile VM Handoff for Edge Computing. In Proceedings of ACM/IEEE Symposium on Edge Computing (SEC). San Jose/Fremont, CA, USA.

[13]

K. He, X. Zhang, S. Ren, and J. Sun. Jun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, USA.

[14]

C. Hu, W. Bao, D. Wang, and F. Liu. Apr.-May. 2019. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM). Paris, France.

[15]

Y. Kang, J. Hauswald, C Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang. Apr. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Xi'an, China.

[16]

Y. Kim, J. Kim, D. Chae, D. Kim, and J. Kim. Mar. 2019. ?Layer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor- Friendly Quantization. In Proceedings of European Conference on Computer Systems (EuroSys). Dresden, Germany.

[17]

Alex Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc.

[18]

K. Kumar and Y. Lu. 2010. Cloud Computing for Mobile Users: Can Offloading Computation Save Energy? Computer 43, 4 (Apr. 2010), pp. 51--56.

Digital Library

[19]

L. Liu, H. Li, and M. Gruteser. Oct. 2019. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of Annual International Conference on Mobile Computing and Networking (MobiCom). Los Cabos, Mexico.

[20]

J. Luo, J. Wu, and W. Lin. Oct. 2017. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In Proceedings of IEEE International Conference on Computer Vision (ICCV). Venice, Italy.

[21]

L. Ma, S. Yi, and Q. Li. 2017. Efficient Service Handoff Across Edge Servers via Docker Container Migration. In Proceedings of ACM/IEEE Symposium on Edge Computing (SEC). San Jose/Fremont, CA, USA.

[22]

P. Mach and Z. Becvar. 2017. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Communications Surveys Tutorials 19, 3 (Mar. 2017), pp. 1628--1656.

Digital Library

[23]

Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief. 2017. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Communications Surveys Tutorials 19, 4 (Aug. 2017), pp. 2322--2358.

[24]

MATLAB. [n. d.]. Piecewise Cubic Hermite Interpolating Polynomial (PCHIP). Retrieved May 31, 2019 from https://au.mathworks.com/help/matlab/ref/pchip. html#References

[25]

C. Pei, Z. Wang, Y. Zhao, Z. Wang, Y. Meng, D. Pei, Y. Peng, W. Tang, and X. Qu. May. 2017. Why it takes so long to connect to a WiFi access point. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM). Atlanta, GA, USA.

[26]

D. Satria, D. Park, and M. Jo. 2017. Recovery for overloaded mobile edge computing. Future Generation Computer Systems 70 (May. 2017), pp. 138 -- 147.

[27]

V. Sindhwani, T. N. Sainath, and S. Kumar. 2015. Structured Transforms for Small-Footprint Deep Learning. In Advances in Neural Information Processing Systems 28. Curran Associates, Inc.

[28]

M. Sun, D. Snyder, Y. Gao, V. Nagaraja, M. Rodehorst, S. Panchapagesan, N. Strom, S. Matsoukas, and S. Vitaladevuni. Aug. 2017. Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH). Stockholm, Sweden.

[29]

S. Teerapittayanon, B. McDanel, and HT. Kung. Dec. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In Proceedings of International Conference on Pattern Recognition (ICPR). Cancun, Mexifco.

[30]

S. Teerapittayanon, B. McDanel, and H. T. Kung. Jun. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. In Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS). Atlanta, GA, USA.

[31]

X. Wang, F. Yu, Z.Y. Dou, T. Darrell, and J. E. Gonzalez. Sep. 2018. SkipNet: Learning Dynamic Routing in Convolutional Networks. In Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany.

[32]

Q. Xia, W. Liang, and W. Xu. Oct. 2013. Throughput maximization for online request admissions in mobile cloudlets. In Proceedings of Annual IEEE Conference on Local Computer Networks. Sydney, NSW, Australia.

[33]

M. Xu, F. Qian, M. Zhu, F. Huang, S. Pushp, and X. Liu. 2019. DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning. IEEE Transactions on Mobile Computing (2019).

[34]

Y. Zhang, D. Niyato, and P. Wang. 2015. Offloading in Mobile Cloudlet Systems with Intermittent Connectivity. IEEE Transactions on Mobile Computing 14, 12 (Dec. 2015), pp. 2516--2529.

Digital Library

[35]

Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang. 2019. Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing. CoRR abs/1905.10083 (2019). arXiv:1905.10083

Cited By

She YShi TWang JLiu B(2024)Dynamic Batching and Early-Exiting for Accurate and Timely Edge Inference2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring)10.1109/VTC2024-Spring62846.2024.10682995(1-6)Online publication date: 24-Jun-2024
https://doi.org/10.1109/VTC2024-Spring62846.2024.10682995
Moothedath VChampati JGross J(2024)Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the EdgeIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.33665012(280-297)Online publication date: 2024
https://doi.org/10.1109/TMLCN.2024.3366501
Wu RBao WGe LElkind E(2023)Online task assignment with controllable processing timeProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/607(5466-5474)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/607
Show More Cited By

Index Terms

SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage
1. Networks
  1. Network performance evaluation
    1. Network performance modeling

Recommendations

Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Recent advances in Deep Neural Networks (DNNs) have dramatically improved the accuracy of DNN inference, but also introduce larger latency. In this paper, we investigate how to utilize early exit, a novel method that allows inference to exit at earlier ...
Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy
MECOMM'18: Proceedings of the 2018 Workshop on Mobile Edge Communications

As the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance and energy ...
eDeepSave: Saving DNN Inference using Early Exit During Handovers in Mobile Edge Environment
Recent advances in deep neural networks (DNNs) have substantially improved the accuracy of intelligent applications. One effective scheme known as DNN partition further improves the speed of the inference by partitioning the DNN to a mobile device and its ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSWIM '19: Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems

November 2019

340 pages

ISBN:9781450369046

DOI:10.1145/3345768

General Chair:
Antonio A. F. Loureiro
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Salil Kanhere
University of New South Wales, Australia
,
Paolo Bellavista
University of Bologna, Italy

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSIM: ACM Special Interest Group on Simulation and Modeling

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article

Conference

MSWiM '19

Sponsor:

SIGSIM

MSWiM '19: 22nd Int'l ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems

November 25 - 29, 2019

FL, Miami Beach, USA

Acceptance Rates

Overall Acceptance Rate 398 of 1,577 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
597
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

She YShi TWang JLiu B(2024)Dynamic Batching and Early-Exiting for Accurate and Timely Edge Inference2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring)10.1109/VTC2024-Spring62846.2024.10682995(1-6)Online publication date: 24-Jun-2024
https://doi.org/10.1109/VTC2024-Spring62846.2024.10682995
Moothedath VChampati JGross J(2024)Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the EdgeIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.33665012(280-297)Online publication date: 2024
https://doi.org/10.1109/TMLCN.2024.3366501
Wu RBao WGe LElkind E(2023)Online task assignment with controllable processing timeProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/607(5466-5474)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/607
Bajpai DTrivedi VYadav SHanawal M(2023)SplitEE: Early Exit in Deep Neural Networks with Split ComputingProceedings of the Third International Conference on AI-ML Systems10.1145/3639856.3639873(1-9)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3639856.3639873
Fresa AChampati J(2023)Offloading Algorithms for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326745834:7(2025-2039)Online publication date: Jul-2023
https://doi.org/10.1109/TPDS.2023.3267458
She YLi MJin YXu MWang JLiu B(2023)On-demand Edge Inference Scheduling with Accuracy and Deadline Guarantee2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188769(1-10)Online publication date: 19-Jun-2023
https://doi.org/10.1109/IWQoS57198.2023.10188769
Pacheco RShifrin MCouto RMenasché DHanawal MCampista M(2023)AdaEE: Adaptive Early-Exit DNN Inference Through Multi-Armed BanditsICC 2023 - IEEE International Conference on Communications10.1109/ICC45041.2023.10279243(3726-3731)Online publication date: 28-May-2023
https://doi.org/10.1109/ICC45041.2023.10279243
Duan SWang DRen JLyu FZhang YWu HShen X(2023)Distributed Artificial Intelligence Empowered by End-Edge-Cloud Computing: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2022.321852725:1(591-624)Online publication date: Sep-2024
https://doi.org/10.1109/COMST.2022.3218527
N U HHanawal MBhardwaj A(2022)Unsupervised Early Exit in DNNs with Multiple ExitsProceedings of the Second International Conference on AI-ML Systems10.1145/3564121.3564137(1-9)Online publication date: 12-Oct-2022
https://dl.acm.org/doi/10.1145/3564121.3564137
Ge LWang ZBao WYuan DTran NZhou BZomaya A(2022)Semi-Online Multi-Machine with Restart Scheduling for Integrated Edge and Cloud Computing SystemsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545059(1-13)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545059
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten