skip to main content
10.1145/3489517.3530474acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Sniper: cloud-edge collaborative inference scheduling with neural network similarity modeling

Published: 23 August 2022 Publication History

Abstract

The cloud-edge collaborative inference demands scheduling the artificial intelligence (AI) tasks efficiently to the appropriate edge smart device. However, the continuously iterative deep neural networks (DNNs) and heterogeneous devices pose great challenges for inference tasks scheduling. In this paper, we propose a self-update cloud-edge collaborative inference scheduling system (Sniper) with time awareness. At first, considering that similar networks exhibit similar behaviors, we develop a non-invasive performance characterization network (PCN) based on neural network similarity (NNS) to accurately predict the inference time of DNNs. Moreover, PCN and time-based scheduling algorithms can be flexibly combined into the scheduling module of Sniper. Experimental results show that the average relative error of network inference time prediction is about 8.06%. Compared with the traditional method without time awareness, Sniper can reduce the waiting time by 52% on average while achieving a stable increase in throughput.

References

[1]
Min Li, Yu Li, and et al. Appealnet: An efficient and highly-accurate edge/cloud collaborative architecture for DNN inference. In 58th ACM/IEEE Design Automation Conference, DAC 2021, San Francisco, pages 409--414. IEEE, 2021.
[2]
Linux Foundation. Edgexfoundry. Website, 2018. https://www.edgexfoundry.org.
[3]
The KubeEdge SIG AI. Sedna. Website, 2020. https://github.com/kubeedge/sedna.
[4]
Yecheng Xiang and Hyoseung Kim. Pipelined data-parallel CPU/GPU scheduling for multi-dnn real-time inference. In IEEE Real-Time Systems Symposium, RTSS 2019, Hong Kong, SAR, China, December 3--6, 2019, pages 392--405. IEEE, 2019.
[5]
Liekang Zeng, Xu Chen, Zhi Zhou, Lei Yang, and Junshan Zhang. Coedge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices. IEEE/ACM Trans. Netw., 29(2):595--608, 2021.
[6]
Mounir Bensalem, Jasenka Dizdarevic, and Admela Jukan. Modeling of deep neural network (DNN) placement and inference in edge computing. In IEEE International Conference on Communications Workshops, pages 1--6. IEEE, 2020.
[7]
Xiaorui Wu, Hong Xu, and Yi Wang. Irina: Accelerating DNN inference with efficient online scheduling. In APNet '20: 4th Asia-Pacific Workshop on Networking, Seoul, Korea, 3--4 August, 2020, pages 36--43. ACM, 2020.
[8]
Daniel Casini, Alessandro Biondi, and Giorgio Carlo Buttazzo. Task splitting and load balancing of dynamic real-time workloads for semi-partitioned EDF. IEEE Trans. Computers, 70(12):2168--2181, 2021.
[9]
Shuai Liu, Zidong Wang, Guoliang Wei, and Maozhen Li. Distributed set-membership filtering for multirate systems under the round-robin scheduling over sensor networks. IEEE Trans. Cybern., 50(5):1910--1920, 2020.
[10]
Zhihao Jia, James Thomas, and et al. Optimizing dnn computation with relaxed graph substitutions. In Proceedings of Machine Learning and Systems, volume 1, pages 27--39, 2019.
[11]
Zhihao Jia and Sina aand et al. Lin. Exploring hidden dimensions in accelerating convolutional neural networks. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 2274--2283, 10--15 Jul 2018.
[12]
Li Lyna Zhang, Shihao Han, and et al. nn-meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices. In MobiSys '21: The 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, Wisconsin, USA, pages 81--93. ACM, 2021.
[13]
Qing Qin, Jie Ren, and et al. To compress, or not to compress: Characterizing deep learning model compression for embedded inference. In 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, pages 729--736, 2018.
[14]
Adrián Csiszárik, Péter Korösi-Szabó, and et al. Similarity and matching of neural network representations. CoRR, abs/2110.14633, 2021.
[15]
Daniel W. Otter, Julian R. Medina, and Jugal K. Kalita. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Networks Learn. Syst., 32(2):604--624, 2021.
[16]
Atefeh Shahroudnejad. A survey on understanding, visualizations, and explanation of deep neural networks. CoRR, abs/2102.01792, 2021.
[17]
Jia Zhao, Kun Yang, and et al. A heuristic clustering-based task deployment approach for load balancing using bayes theorem in cloud environment. IEEE Trans. Parallel Distributed Syst., 27(2):305--316, 2016.
[18]
Shaojun Zhang and et al. Dybatch: Efficient batching and fair scheduling for deep learning inference on time-sharing devices. In 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, pages 609--618, 2020.
[19]
Lei Yang, Zheyu Yan, and et al. Co-exploration of neural architectures and heterogeneous ASIC accelerator designs targeting multiple tasks. In 57th ACM/IEEE Design Automation Conference, DAC 2020, San Francisco, CA, USA, July 20--24, 2020, pages 1--6. IEEE, 2020.
[20]
Ashish Vaswani, Noam Shazeer, Niki Parmar, and et al. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pages 5998--6008, 2017.
[21]
M. Dominic and Bijendra N. Jain. Conditions for on-line scheduling of hard real-time tasks on multiprocessors. J. Parallel Distributed Comput., 55(1):121--137, 1998.
[22]
Najme Mansouri, Behnam Mohammad Hasani Zade, and Mohammad Masoud Javidi. Hybrid task scheduling strategy for cloud computing by modified particle swarm optimization and fuzzy theory. Comput. Ind. Eng., 130:597--633, 2019.
[23]
Ananda Samajdar, Jan Moritz Joseph, and et al. A systematic methodology for characterizing scalability of DNN accelerators using scale-sim. In IEEE ISPASS, pages 58--68. IEEE, 2020.
[24]
Marion Sbai, Muhamad Risqi U. Saputra, and et al. Cut, distil and encode (CDE): split cloud-edge deep inference. In 18th Annual IEEE International Conference on Sensing, Communication, and Networking, SECON, pages 1--9. IEEE, 2021.
[25]
Babak Zamirai, Salar Latifi, Pedram Zamirai, and Scott A. Mahlke. SIEVE: speculative inference on the edge with versatile exportation. In 57th ACM/IEEE Design Automation Conference, DAC 2020, San Francisco, CA, USA, pages 1--6. IEEE, 2020.
[26]
Roberto Gonçalves Pacheco, Rodrigo S. Couto, and et al. Calibration-aided edge inference offloading via adaptive model partitioning of deep neural networks. In ICC 2021 - IEEE International Conference on Communications, Montreal, pages 1--6. IEEE, 2021.

Cited By

View all
  • (2024)EdgeCloudAI: Edge-Cloud Distributed Video AnalyticsProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3698857(1778-1780)Online publication date: 4-Dec-2024
  • (2024)Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370643:11(4154-4165)Online publication date: Nov-2024
  • (2024)FedStar: Efficient Federated Learning on Heterogeneous Communication NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334627443:6(1848-1861)Online publication date: Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
ISBN:9781450391429
DOI:10.1145/3489517
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI system
  2. cloud-edge collaborative inference
  3. heterogeneous computing
  4. neural network similarity
  5. scheduling

Qualifiers

  • Research-article

Funding Sources

Conference

DAC '22
Sponsor:
DAC '22: 59th ACM/IEEE Design Automation Conference
July 10 - 14, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)130
  • Downloads (Last 6 weeks)15
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)EdgeCloudAI: Edge-Cloud Distributed Video AnalyticsProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3698857(1778-1780)Online publication date: 4-Dec-2024
  • (2024)Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370643:11(4154-4165)Online publication date: Nov-2024
  • (2024)FedStar: Efficient Federated Learning on Heterogeneous Communication NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334627443:6(1848-1861)Online publication date: Jun-2024
  • (2023)JAVP: Joint-Aware Video Processing with Edge-Cloud Collaboration for DNN InferenceProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613914(9152-9160)Online publication date: 26-Oct-2023
  • (2023)Ace-Sniper: Cloud–Edge Collaborative Scheduling Framework With DNN Inference Latency Modeling on Heterogeneous DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331438843:2(534-547)Online publication date: 12-Sep-2023
  • (2023)Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247821(1-6)Online publication date: 9-Jul-2023
  • (2023)A Survey on Collaborative DNN Inference for Edge IntelligenceMachine Intelligence Research10.1007/s11633-022-1391-720:3(370-395)Online publication date: 3-May-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media