skip to main content
10.1145/3576842.3582375acmconferencesArticle/Chapter ViewAbstractPublication PagesiotdiConference Proceedingsconference-collections
research-article

Dělen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI

Authors Info & Claims
Published:09 May 2023Publication History

ABSTRACT

Model-serving systems expose machine learning (ML) models to applications programmatically via a high-level API. Cloud platforms use these systems to mask the complexities of optimally managing resources and servicing inference requests across multiple applications. Model serving at the edge is now also becoming increasingly important to support inference workloads with tight latency requirements. However, edge model serving differs substantially from cloud model serving in its latency, energy, and accuracy constraints: these systems must support multiple applications with widely different latency and accuracy requirements on embedded edge accelerators with limited computational and energy resources.

To address the problem, this paper presents Dělen,1 a flexible and adaptive model-serving system for multi-tenant edge AI. Dělen exposes a high-level API that enables individual edge applications to specify a bound at runtime on the latency, accuracy, or energy of their inference requests. We efficiently implement Dělen using conditional execution in multi-exit deep neural networks (DNNs), which enables granular control over inference requests, and evaluate it on a resource-constrained Jetson Nano edge accelerator. We evaluate Dělen flexibility by implementing state-of-the-art adaptation policies using Dělen’s API, and evaluate its adaptability under different workload dynamics and goals when running single and multiple applications.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.Google ScholarGoogle Scholar
  2. Alexei Baevski, H. Zhou, Abdel rahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. ArXiv abs/2006.11477 (2020).Google ScholarGoogle Scholar
  3. Brendan Barry, Cormac Brick, F. Connor, David Donohoe, D. Moloney, R. Richmond, M. O’Riordan, and V. Toma. 2015. Always-on Vision Processing Unit for Mobile Applications. IEEE Micro 35 (2015), 56–66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive Neural Networks for Efficient Inference. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML’17). JMLR.org, 527–536.Google ScholarGoogle Scholar
  5. Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101 – Mining Discriminative Components with Random Forests. In European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  6. Qingqing Cao, Noah Weber, Niranjan Balasubramanian, and Aruna Balasubramanian. 2019. DeQA: On-Device Question Answering. In Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services (Seoul, Republic of Korea) (MobiSys ’19). 27–40. https://doi.org/10.1145/3307334.3326071Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Cass. 2019. Taking AI to the edge: Google’s TPU now comes in a maker-friendly package. IEEE Spectrum 56 (2019), 16–17.Google ScholarGoogle ScholarCross RefCross Ref
  8. D. Crankshaw, G. Sela, X. Mo, C. Zumar, I. Stoica, J. Gonzalez, and A. Tumanov. 2020. Inferline: Latency-aware Provisioning and Scaling for Prediction Serving Pipelines. In SoCC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle Scholar
  10. Open Neural Network Exchange. 2021. ONNX model zoo. https://github.com/onnx/modelsGoogle ScholarGoogle Scholar
  11. Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (New Delhi, India) (MobiCom ’18). 115–127. https://doi.org/10.1145/3241539.3241559Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Eric Flamand, Davide Rossi, Francesco Conti, Igor Loi, Antonio Pullini, Florent Rotenberg, and Luca Benini. 2018. GAP-8: A RISC-V SoC for AI at the Edge of the IoT. In 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 1–4. https://doi.org/10.1109/ASAP.2018.8445101Google ScholarGoogle ScholarCross RefCross Ref
  13. Jason Flinn, Soyoung Park, and Mahadev Satyanarayanan. 2002. Balancing performance, energy, and quality in pervasive computing. Proceedings 22nd International Conference on Distributed Computing Systems (2002), 217–226.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jason Flinn and M. Satyanarayanan. 1999. Energy-Aware Adaptation for Mobile Applications. In Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles (Charleston, South Carolina, USA) (SOSP ’99). Association for Computing Machinery, New York, NY, USA, 48–63. https://doi.org/10.1145/319151.319155Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Peizhen Guo, Bo Hu, and Wenjun Hu. 2021. Mistify: Automating DNN Model Porting for On-Device Inference at the Edge. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 705–719. https://www.usenix.org/conference/nsdi21/presentation/guoGoogle ScholarGoogle Scholar
  16. M. Halpern, B. Boroujerdian, T. Mummert, E. Duesterwald, and V. Reddi. 2019. One Size Does Not Fit All: Quantifying and Exposing the Accuracy-latency Trade-off in Machine Learning Cloud Service APIs via Tolerance Tiers. In ISPASS.Google ScholarGoogle Scholar
  17. Walid A. Hanafy, Tergel Molom-Ochir, and Rohan Shenoy. 2021. Design Considerations for Energy-efficient Inference on Edge Devices. In The Twelfth ACM International Conference on Future Energy Systems (e-Energy ’21) (Virtual Event, Italy). 7 pages. https://doi.org/10.1145/3447555.3465326Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 770–778.Google ScholarGoogle Scholar
  19. Yitao Hu, Weiwu Pang, Xiaochen Liu, Rajrup Ghosh, Bongjun Ko, Wei-Han Lee, and Ramesh Govindan. 2021. Rim: Offloading Inference to the Edge. In Proceedings of the International Conference on Internet-of-Things Design and Implementation (Charlottesvle, VA, USA) (IoTDI ’21). Association for Computing Machinery, New York, NY, USA, 80–92. https://doi.org/10.1145/3450268.3453521Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gao Huang, Danlu Chen, T. Li, Felix Wu, L. V. D. Maaten, and Kilian Q. Weinberger. 2017. Multi-Scale Dense Convolutional Networks for Efficient Prediction. ArXiv abs/1703.09844 (2017).Google ScholarGoogle Scholar
  21. Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. DeepMon: Mobile GPU-Based Deep Learning Framework for Continuous Vision Applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services (Niagara Falls, New York, USA) (MobiSys ’17). Association for Computing Machinery, New York, NY, USA, 82–95. https://doi.org/10.1145/3081333.3081360Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nitthilan Kanappan Jayakodi, Syrine Belakaria, Aryan Deshwal, and Janardhan Rao Doppa. 2020. Design and Optimization of Energy-Accuracy Tradeoff Networks for Mobile Platforms via Pretrained Deep Models. ACM Trans. Embed. Comput. Syst. 19, 1, Article 4 (Feb. 2020), 24 pages. https://doi.org/10.1145/3366636Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nitthilan Kannappan Jayakodi, Anwesha Chatterjee, Wonje Choi, Janardhan Rao Doppa, and Partha Pratim Pande. 2018. Trading-Off Accuracy and Energy of Deep Inference on Embedded Systems: A Co-Design Approach. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2881–2893. https://doi.org/10.1109/TCAD.2018.2857338Google ScholarGoogle ScholarCross RefCross Ref
  24. F. P. Kelly, A. K. Maulloo, and D. K. H. Tan. 1998. Rate Control for Communication Networks: Shadow Prices, Proportional Fairness and Stability. The Journal of the Operational Research Society 49, 3 (1998), 237–252. http://www.jstor.org/stable/3010473Google ScholarGoogle ScholarCross RefCross Ref
  25. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).Google ScholarGoogle Scholar
  26. Stefanos Laskaridis, Stylianos I. Venieris, Hyeji Kim, and Nicholas D. Lane. 2020. HAPI: Hardware-Aware Progressive Inference. 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD) (2020), 1–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Lee, A. Scolari, B. Chun, M. Santambrogio, M. Weimer, and M. Interlandi. 2018. PETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. In OSDI.Google ScholarGoogle Scholar
  28. En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2020. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing. IEEE Transactions on Wireless Communications 19, 1 (2020), 447–457. https://doi.org/10.1109/TWC.2019.2946140Google ScholarGoogle ScholarCross RefCross Ref
  29. Christopher A. Mattson and Achille Messac. 2005. Pareto Frontier Based Concept Selection Under Uncertainty, with Visualization. Optimization and Engineering 6, 1 (2005), 85–115. https://doi.org/10.1023/B:OPTE.0000048538.35456.45Google ScholarGoogle ScholarCross RefCross Ref
  30. David Mellis, Massimo Banzi, David Cuartielles, and Tom Igoe. 2007. Arduino: An open electronic prototyping platform. In Proc. Chi, Vol. 2007. 1–11.Google ScholarGoogle Scholar
  31. Niluthpol Chowdhury Mithun, Sirajum Munir, Karen Guo, and Charles Shelton. 2018. ODDS: Real-Time Object Detection Using Depth Sensors on Embedded GPUs. In 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). 230–241. https://doi.org/10.1109/IPSN.2018.00051Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alessandro Montanari, Manuja Sharma, Dainius Jenkus, Mohammed Alloulah, Lorena Qendro, and Fahim Kawsar. 2020. EPerceptive: Energy Reactive Embedded Intelligence for Batteryless Sensors. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems (Virtual Event, Japan) (SenSys ’20). 382–394. https://doi.org/10.1145/3384419.3430782Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dushyanth Narayanan and Mahadev Satyanarayanan. 2003. Predictive Resource Management for Wearable Computing. In MobiSys ’03.Google ScholarGoogle Scholar
  34. Brian D. Noble, Mahadev Satyanarayanan, Dushyanth Narayanan, J. Eric Tilton, Jason Flinn, and Kevin R. Walker. 1997. Agile application-aware adaptation for mobility. Proceedings of the sixteenth ACM symposium on Operating systems principles (1997).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nvidia. 2020. NVIDIA Jetson Modules. Retrieved October 19, 2020 from https://developer.nvidia.com/embedded/jetson-modulesGoogle ScholarGoogle Scholar
  36. Priyadarshini Panda, Abhronil Sengupta, and Kaushik Roy. 2016. Conditional Deep Learning for energy-efficient and enhanced pattern recognition. In 2016 Design, Automation Test in Europe Conference Exhibition (DATE). 475–480.Google ScholarGoogle Scholar
  37. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 779–788.Google ScholarGoogle ScholarCross RefCross Ref
  39. Francisco Romero, Qian Li, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2021. INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 397–411. https://www.usenix.org/conference/atc21/presentation/romeroGoogle ScholarGoogle Scholar
  40. Mahadev Satyanarayanan and Nigel Davies. 2019. Augmenting Cognition Through Edge Computing. Computer 52, 7 (2019), 37–46. https://doi.org/10.1109/MC.2019.2911878Google ScholarGoogle ScholarCross RefCross Ref
  41. Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv abs/1905.11946 (2019).Google ScholarGoogle Scholar
  42. Tianxiang Tan and Guohong Cao. 2021. Efficient Execution of Deep Neural Networks on Mobile Devices with NPU. In Proceedings of the 20th International Conference on Information Processing in Sensor Networks (Co-Located with CPS-IoT Week 2021) (Nashville, TN, USA) (IPSN ’21). 283–298. https://doi.org/10.1145/3412382.3458272Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ben Taylor, Vicent Sanz Marco, Willy Wolff, Yehia Elkhatib, and Zheng Wang. 2018. Adaptive Deep Learning Model Selection on Embedded Systems. In Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (Philadelphia, PA, USA) (LCTES 2018). 31–43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Surat Teerapittayanon, Bradley McDanel, and H.T. Kung. 2016. BranchyNet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). 2464–2469. https://doi.org/10.1109/ICPR.2016.7900006Google ScholarGoogle ScholarCross RefCross Ref
  45. Camill Trueeb, Carmelo Sferrazza, and Raffaello D’Andrea. 2020. Towards vision-based robotic skins: a data-driven, multi-camera tactile sensor. In 2020 3rd IEEE International Conference on Soft Robotics (RoboSoft). 333–338. https://doi.org/10.1109/RoboSoft48309.2020.9116060Google ScholarGoogle ScholarCross RefCross Ref
  46. J. Turner. 1986. New directions in communications (or which way to the information age?). IEEE Communications Magazine 24, 10 (1986), 8–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Chengcheng Wan, Muhammad Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, and Shan Lu. 2020. ALERT: Accurate Learning for Energy and Timeliness. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 353–369.Google ScholarGoogle Scholar
  48. Junjue Wang, Ziqiang Feng, Shilpa George, Roger Iyengar, Padmanabhan Pillai, and Mahadev Satyanarayanan. 2019. Towards Scalable Edge-Native Applications. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing (Arlington, Virginia) (SEC ’19). Association for Computing Machinery, New York, NY, USA, 152–165. https://doi.org/10.1145/3318216.3363308Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Pete Warden. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv abs/1804.03209 (2018).Google ScholarGoogle Scholar
  50. Hao Wu, Jinghao Feng, Xuejin Tian, Edward Sun, Yunxin Liu, Bo Dong, Fengyuan Xu, and Sheng Zhong. 2020. EMO: Real-Time Emotion Recognition from Single-Eye Images for Resource-Constrained Eyewear Devices. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services (Toronto, Ontario, Canada) (MobiSys ’20). 448–461. https://doi.org/10.1145/3386901.3388917Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xiaorui Wu, Hong Xu, and Yi Wang. 2020. Irina: Accelerating DNN Inference with Efficient Online Scheduling(APNet ’20). Association for Computing Machinery, New York, NY, USA, 36–43. https://doi.org/10.1145/3411029.3411035Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Mengwei Xu, Xiwen Zhang, Yunxin Liu, Gang Huang, Xuanzhe Liu, and Felix Xiaozhu Lin. 2020. Approximate Query Service on Autonomous IoT Cameras. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services (Toronto, Ontario, Canada) (MobiSys ’20). 191–205. https://doi.org/10.1145/3386901.3388948Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. 2020. NEMO: Enabling Neural-Enhanced Video Streaming on Commodity Mobile Devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (London, United Kingdom) (MobiCom ’20). Article 28, 14 pages. https://doi.org/10.1145/3372224.3419185Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Juheon Yi, Sunghyun Choi, and Youngki Lee. 2020. EagleEye: Wearable Camera-Based Person Identification in Crowded Urban Spaces. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (London, United Kingdom) (MobiCom ’20). Article 4, 14 pages. https://doi.org/10.1145/3372224.3380881Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. C. Zhang, M. Yu, W. Wang, and F. Yan. 2019. Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference. In USENIX ATC.Google ScholarGoogle Scholar
  56. J. Zhang, S. Elnikety, S. Zarar, A. Gupta, and S. Garg. 2020. Model-Switching: Dealing with Fluctuating Workloads in Machine-Learning-as-a-Service Systems. In HotCloud.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    IoTDI '23: Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation
    May 2023
    514 pages
    ISBN:9798400700378
    DOI:10.1145/3576842

    Copyright © 2023 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 May 2023

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Upcoming Conference

    IoTDI '24
  • Article Metrics

    • Downloads (Last 12 months)108
    • Downloads (Last 6 weeks)14

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format