skip to main content
10.1145/3345768.3355917acmconferencesArticle/Chapter ViewAbstractPublication PagesmswimConference Proceedingsconference-collections
research-article

SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage

Published: 25 November 2019 Publication History

Abstract

In recent years, the rapid development of edge computing enables us to process a wide variety of intelligent applications at the edge, such as real-time video analytics. However, edge computing could suffer from service outage caused by the fluctuated wireless connection or congested computing resource. During the service outage, the only choice is to process the deep neural network (DNN) inference at the local mobile devices. The obstacle is that due to the limited resource, it may not be possible to complete inference tasks on time. Inspired by the recently developedearly exit of DNNs, where we can exit DNN at earlier layers to shorten the inference delay by sacrificing an acceptable level of accuracy, we propose to adopt such mechanism to process inference tasks during the service outage. The challenge is how to obtain the optimal schedule with diverse early exit choices. To this end, we formulate an optimal scheduling problem with the objective to maximize a general overall utility. However, the problem is in the form of integer programming, which cannot be solved by a standard approach. We therefore prove the Ordered Scheduling structure, indicating that a frame arrived earlier must be scheduled earlier. Such structure greatly decreases the searching space for an optimal solution. Then, we propose the Scheduling Early Exit (SEE) algorithm based on dynamic programming, to solve the problem optimally with polynomial computational complexity. Finally, we conduct trace-driven simulations and compare SEE with two benchmarks. The result shows that SEE can outperform the benchmarks by 50.9%.

References

[1]
2017. LTE upload speed super slow everywhere. Retrieved May 31, 2019 from https://community.verizonwireless.com/t5/iPhone-X-Xr-Xs/LTE-uploadspeed- super-slow-everywhere/td-p/1026178
[2]
2019. Recommended upload encoding settings - YouTube Help. Retrieved May 31, 2019 from https://support.google.com/youtube/answer/1722171?hl=en
[3]
N. Abbas, Y. Zhang, A. Taherkordi, and T. Skeie. 2018. Mobile Edge Computing: A Survey. IEEE Internet of Things Journal 5, 1 (Feb. 2018), pp. 450--465.
[4]
J. Almeida, V. Almeida, D. Ardagna, Í. Cunha, C. Francalanci, and M. Trubian. 2010. Joint admission control and resource allocation in virtualized servers. J. Parallel and Distrib. Comput. 70, 4 (2010), 344 -- 362.
[5]
W. Bao, D. Yuan, Z. Yang, S. Wang, B. Zhou, S. Adams, and A. Zomaya. Oct. 2018. sFog: Seamless Fog Computing Environment for Mobile IoT Applications. In Proceedings of ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWIM). Montreal, Canada.
[6]
S. Bhattacharya and Nicholas D. Lane. Nov. 2016. Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables. In Proceedings of ACM Conference on Embedded Network Sensor Systems (SenSys). Stanford, CA, USA.
[7]
T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama. Aug. 2017. Adaptive Neural Networks for Efficient Inference. In Proceedings of International Conference on Machine Learning (ICML). Sydney, NSW, Australia.
[8]
J. Chen and X. Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (Aug 2019), 1655--1674.
[9]
M. Chen and Y. Hao. 2018. Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network. IEEE Journal on Selected Areas in Communications 36, 3 (Mar. 2018), pp. 587--597.
[10]
X. Chen, L. Jiao,W. Li, and X. Fu. 2016. Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing. IEEE/ACM Transactions on Networking 24, 5 (Oct. 2016), pp. 2795--2808.
[11]
Z. Fang, D. Hong, and R. K. Gupta. Jun. 2019. Serving Deep Neural Networks at the Cloud Edge for Vision Applications on Mobile Platforms (MMSys). Amherst, MA, USA.
[12]
K. Ha, Y. Abe, T. Eiszler, Z. Chen, W. Hu, B. Amos, R. Upadhyaya, P. Pillai, and M. Satyanarayanan. April. 2017. You Can Teach Elephants to Dance: Agile VM Handoff for Edge Computing. In Proceedings of ACM/IEEE Symposium on Edge Computing (SEC). San Jose/Fremont, CA, USA.
[13]
K. He, X. Zhang, S. Ren, and J. Sun. Jun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, USA.
[14]
C. Hu, W. Bao, D. Wang, and F. Liu. Apr.-May. 2019. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM). Paris, France.
[15]
Y. Kang, J. Hauswald, C Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang. Apr. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Xi'an, China.
[16]
Y. Kim, J. Kim, D. Chae, D. Kim, and J. Kim. Mar. 2019. ?Layer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor- Friendly Quantization. In Proceedings of European Conference on Computer Systems (EuroSys). Dresden, Germany.
[17]
Alex Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc.
[18]
K. Kumar and Y. Lu. 2010. Cloud Computing for Mobile Users: Can Offloading Computation Save Energy? Computer 43, 4 (Apr. 2010), pp. 51--56.
[19]
L. Liu, H. Li, and M. Gruteser. Oct. 2019. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of Annual International Conference on Mobile Computing and Networking (MobiCom). Los Cabos, Mexico.
[20]
J. Luo, J. Wu, and W. Lin. Oct. 2017. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In Proceedings of IEEE International Conference on Computer Vision (ICCV). Venice, Italy.
[21]
L. Ma, S. Yi, and Q. Li. 2017. Efficient Service Handoff Across Edge Servers via Docker Container Migration. In Proceedings of ACM/IEEE Symposium on Edge Computing (SEC). San Jose/Fremont, CA, USA.
[22]
P. Mach and Z. Becvar. 2017. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Communications Surveys Tutorials 19, 3 (Mar. 2017), pp. 1628--1656.
[23]
Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief. 2017. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Communications Surveys Tutorials 19, 4 (Aug. 2017), pp. 2322--2358.
[24]
MATLAB. [n. d.]. Piecewise Cubic Hermite Interpolating Polynomial (PCHIP). Retrieved May 31, 2019 from https://au.mathworks.com/help/matlab/ref/pchip. html#References
[25]
C. Pei, Z. Wang, Y. Zhao, Z. Wang, Y. Meng, D. Pei, Y. Peng, W. Tang, and X. Qu. May. 2017. Why it takes so long to connect to a WiFi access point. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM). Atlanta, GA, USA.
[26]
D. Satria, D. Park, and M. Jo. 2017. Recovery for overloaded mobile edge computing. Future Generation Computer Systems 70 (May. 2017), pp. 138 -- 147.
[27]
V. Sindhwani, T. N. Sainath, and S. Kumar. 2015. Structured Transforms for Small-Footprint Deep Learning. In Advances in Neural Information Processing Systems 28. Curran Associates, Inc.
[28]
M. Sun, D. Snyder, Y. Gao, V. Nagaraja, M. Rodehorst, S. Panchapagesan, N. Strom, S. Matsoukas, and S. Vitaladevuni. Aug. 2017. Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH). Stockholm, Sweden.
[29]
S. Teerapittayanon, B. McDanel, and HT. Kung. Dec. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In Proceedings of International Conference on Pattern Recognition (ICPR). Cancun, Mexifco.
[30]
S. Teerapittayanon, B. McDanel, and H. T. Kung. Jun. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. In Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS). Atlanta, GA, USA.
[31]
X. Wang, F. Yu, Z.Y. Dou, T. Darrell, and J. E. Gonzalez. Sep. 2018. SkipNet: Learning Dynamic Routing in Convolutional Networks. In Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany.
[32]
Q. Xia, W. Liang, and W. Xu. Oct. 2013. Throughput maximization for online request admissions in mobile cloudlets. In Proceedings of Annual IEEE Conference on Local Computer Networks. Sydney, NSW, Australia.
[33]
M. Xu, F. Qian, M. Zhu, F. Huang, S. Pushp, and X. Liu. 2019. DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning. IEEE Transactions on Mobile Computing (2019).
[34]
Y. Zhang, D. Niyato, and P. Wang. 2015. Offloading in Mobile Cloudlet Systems with Intermittent Connectivity. IEEE Transactions on Mobile Computing 14, 12 (Dec. 2015), pp. 2516--2529.
[35]
Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang. 2019. Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing. CoRR abs/1905.10083 (2019). arXiv:1905.10083

Cited By

View all
  • (2024)Dynamic Batching and Early-Exiting for Accurate and Timely Edge Inference2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring)10.1109/VTC2024-Spring62846.2024.10682995(1-6)Online publication date: 24-Jun-2024
  • (2024)Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the EdgeIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.33665012(280-297)Online publication date: 2024
  • (2023)Online task assignment with controllable processing timeProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/607(5466-5474)Online publication date: 19-Aug-2023
  • Show More Cited By

Index Terms

  1. SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSWIM '19: Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems
    November 2019
    340 pages
    ISBN:9781450369046
    DOI:10.1145/3345768
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Best Paper

    Author Tags

    1. computation offloading
    2. dnn inference
    3. early exit
    4. edge computing

    Qualifiers

    • Research-article

    Conference

    MSWiM '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 398 of 1,577 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Dynamic Batching and Early-Exiting for Accurate and Timely Edge Inference2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring)10.1109/VTC2024-Spring62846.2024.10682995(1-6)Online publication date: 24-Jun-2024
    • (2024)Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the EdgeIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.33665012(280-297)Online publication date: 2024
    • (2023)Online task assignment with controllable processing timeProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/607(5466-5474)Online publication date: 19-Aug-2023
    • (2023)SplitEE: Early Exit in Deep Neural Networks with Split ComputingProceedings of the Third International Conference on AI-ML Systems10.1145/3639856.3639873(1-9)Online publication date: 25-Oct-2023
    • (2023)Offloading Algorithms for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326745834:7(2025-2039)Online publication date: Jul-2023
    • (2023)On-demand Edge Inference Scheduling with Accuracy and Deadline Guarantee2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188769(1-10)Online publication date: 19-Jun-2023
    • (2023)AdaEE: Adaptive Early-Exit DNN Inference Through Multi-Armed BanditsICC 2023 - IEEE International Conference on Communications10.1109/ICC45041.2023.10279243(3726-3731)Online publication date: 28-May-2023
    • (2023)Distributed Artificial Intelligence Empowered by End-Edge-Cloud Computing: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2022.321852725:1(591-624)Online publication date: Sep-2024
    • (2022)Unsupervised Early Exit in DNNs with Multiple ExitsProceedings of the Second International Conference on AI-ML Systems10.1145/3564121.3564137(1-9)Online publication date: 12-Oct-2022
    • (2022)Semi-Online Multi-Machine with Restart Scheduling for Integrated Edge and Cloud Computing SystemsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545059(1-13)Online publication date: 29-Aug-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media