research-article

Sniper: cloud-edge collaborative inference scheduling with neural network similarity modeling

Authors:

Zirui LianAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 505 - 510

https://doi.org/10.1145/3489517.3530474

Published: 23 August 2022 Publication History

Abstract

The cloud-edge collaborative inference demands scheduling the artificial intelligence (AI) tasks efficiently to the appropriate edge smart device. However, the continuously iterative deep neural networks (DNNs) and heterogeneous devices pose great challenges for inference tasks scheduling. In this paper, we propose a self-update cloud-edge collaborative inference scheduling system (Sniper) with time awareness. At first, considering that similar networks exhibit similar behaviors, we develop a non-invasive performance characterization network (PCN) based on neural network similarity (NNS) to accurately predict the inference time of DNNs. Moreover, PCN and time-based scheduling algorithms can be flexibly combined into the scheduling module of Sniper. Experimental results show that the average relative error of network inference time prediction is about 8.06%. Compared with the traditional method without time awareness, Sniper can reduce the waiting time by 52% on average while achieving a stable increase in throughput.

References

[1]

Min Li, Yu Li, and et al. Appealnet: An efficient and highly-accurate edge/cloud collaborative architecture for DNN inference. In 58th ACM/IEEE Design Automation Conference, DAC 2021, San Francisco, pages 409--414. IEEE, 2021.

Digital Library

[2]

Linux Foundation. Edgexfoundry. Website, 2018. https://www.edgexfoundry.org.

[3]

The KubeEdge SIG AI. Sedna. Website, 2020. https://github.com/kubeedge/sedna.

[4]

Yecheng Xiang and Hyoseung Kim. Pipelined data-parallel CPU/GPU scheduling for multi-dnn real-time inference. In IEEE Real-Time Systems Symposium, RTSS 2019, Hong Kong, SAR, China, December 3--6, 2019, pages 392--405. IEEE, 2019.

[5]

Liekang Zeng, Xu Chen, Zhi Zhou, Lei Yang, and Junshan Zhang. Coedge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices. IEEE/ACM Trans. Netw., 29(2):595--608, 2021.

Digital Library

[6]

Mounir Bensalem, Jasenka Dizdarevic, and Admela Jukan. Modeling of deep neural network (DNN) placement and inference in edge computing. In IEEE International Conference on Communications Workshops, pages 1--6. IEEE, 2020.

[7]

Xiaorui Wu, Hong Xu, and Yi Wang. Irina: Accelerating DNN inference with efficient online scheduling. In APNet '20: 4th Asia-Pacific Workshop on Networking, Seoul, Korea, 3--4 August, 2020, pages 36--43. ACM, 2020.

Digital Library

[8]

Daniel Casini, Alessandro Biondi, and Giorgio Carlo Buttazzo. Task splitting and load balancing of dynamic real-time workloads for semi-partitioned EDF. IEEE Trans. Computers, 70(12):2168--2181, 2021.

Digital Library

[9]

Shuai Liu, Zidong Wang, Guoliang Wei, and Maozhen Li. Distributed set-membership filtering for multirate systems under the round-robin scheduling over sensor networks. IEEE Trans. Cybern., 50(5):1910--1920, 2020.

[10]

Zhihao Jia, James Thomas, and et al. Optimizing dnn computation with relaxed graph substitutions. In Proceedings of Machine Learning and Systems, volume 1, pages 27--39, 2019.

[11]

Zhihao Jia and Sina aand et al. Lin. Exploring hidden dimensions in accelerating convolutional neural networks. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 2274--2283, 10--15 Jul 2018.

[12]

Li Lyna Zhang, Shihao Han, and et al. nn-meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices. In MobiSys '21: The 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, Wisconsin, USA, pages 81--93. ACM, 2021.

Digital Library

[13]

Qing Qin, Jie Ren, and et al. To compress, or not to compress: Characterizing deep learning model compression for embedded inference. In 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, pages 729--736, 2018.

[14]

Adrián Csiszárik, Péter Korösi-Szabó, and et al. Similarity and matching of neural network representations. CoRR, abs/2110.14633, 2021.

[15]

Daniel W. Otter, Julian R. Medina, and Jugal K. Kalita. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Networks Learn. Syst., 32(2):604--624, 2021.

[16]

Atefeh Shahroudnejad. A survey on understanding, visualizations, and explanation of deep neural networks. CoRR, abs/2102.01792, 2021.

[17]

Jia Zhao, Kun Yang, and et al. A heuristic clustering-based task deployment approach for load balancing using bayes theorem in cloud environment. IEEE Trans. Parallel Distributed Syst., 27(2):305--316, 2016.

Digital Library

[18]

Shaojun Zhang and et al. Dybatch: Efficient batching and fair scheduling for deep learning inference on time-sharing devices. In 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, pages 609--618, 2020.

[19]

Lei Yang, Zheyu Yan, and et al. Co-exploration of neural architectures and heterogeneous ASIC accelerator designs targeting multiple tasks. In 57th ACM/IEEE Design Automation Conference, DAC 2020, San Francisco, CA, USA, July 20--24, 2020, pages 1--6. IEEE, 2020.

[20]

Ashish Vaswani, Noam Shazeer, Niki Parmar, and et al. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pages 5998--6008, 2017.

[21]

M. Dominic and Bijendra N. Jain. Conditions for on-line scheduling of hard real-time tasks on multiprocessors. J. Parallel Distributed Comput., 55(1):121--137, 1998.

Digital Library

[22]

Najme Mansouri, Behnam Mohammad Hasani Zade, and Mohammad Masoud Javidi. Hybrid task scheduling strategy for cloud computing by modified particle swarm optimization and fuzzy theory. Comput. Ind. Eng., 130:597--633, 2019.

Digital Library

[23]

Ananda Samajdar, Jan Moritz Joseph, and et al. A systematic methodology for characterizing scalability of DNN accelerators using scale-sim. In IEEE ISPASS, pages 58--68. IEEE, 2020.

[24]

Marion Sbai, Muhamad Risqi U. Saputra, and et al. Cut, distil and encode (CDE): split cloud-edge deep inference. In 18th Annual IEEE International Conference on Sensing, Communication, and Networking, SECON, pages 1--9. IEEE, 2021.

[25]

Babak Zamirai, Salar Latifi, Pedram Zamirai, and Scott A. Mahlke. SIEVE: speculative inference on the edge with versatile exportation. In 57th ACM/IEEE Design Automation Conference, DAC 2020, San Francisco, CA, USA, pages 1--6. IEEE, 2020.

[26]

Roberto Gonçalves Pacheco, Rodrigo S. Couto, and et al. Calibration-aided edge inference offloading via adaptive model partitioning of deep neural networks. In ICC 2021 - IEEE International Conference on Communications, Montreal, pages 1--6. IEEE, 2021.

Cited By

Ghasemi MKostic ZGhaderi JZussman GGanesan DLane NShi W(2024)EdgeCloudAI: Edge-Cloud Distributed Video AnalyticsProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3698857(1778-1780)Online publication date: 4-Dec-2024
https://dl.acm.org/doi/10.1145/3636534.3698857
Liu WZhu ZLi BXiong YLian ZGeng JZhou X(2024)Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370643:11(4154-4165)Online publication date: Nov-2024
https://doi.org/10.1109/TCAD.2024.3443706
Cao JWei RCao QZheng YZhu ZJi CZhou X(2024)FedStar: Efficient Federated Learning on Heterogeneous Communication NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334627443:6(1848-1861)Online publication date: Jun-2024
https://doi.org/10.1109/TCAD.2023.3346274
Show More Cited By

Recommendations

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Modified Rate-Monotonic Algorithm for Scheduling Periodic Jobs with Deferred Deadlines

The deadline of a request is the time instant at which its execution must complete. The deadline of the request in any period of a job with deferred deadline is some time instant after the end of the period. The authors describe a semi-static priority-...
Scheduling jobs with agreeable processing times and due dates on a single batch processing machine

In this paper we study the problems of scheduling jobs with agreeable processing times and due dates on a single batch processing machine to minimize total tardiness, and weighted number of tardy jobs. We prove that the problem of minimizing total ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

OPPO Research Fund
National Science Youth Fund of Jiangsu Province
National Natural Science Foundation of China

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
440
Total Downloads

Downloads (Last 12 months)130
Downloads (Last 6 weeks)15

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ghasemi MKostic ZGhaderi JZussman GGanesan DLane NShi W(2024)EdgeCloudAI: Edge-Cloud Distributed Video AnalyticsProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3698857(1778-1780)Online publication date: 4-Dec-2024
https://dl.acm.org/doi/10.1145/3636534.3698857
Liu WZhu ZLi BXiong YLian ZGeng JZhou X(2024)Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370643:11(4154-4165)Online publication date: Nov-2024
https://doi.org/10.1109/TCAD.2024.3443706
Cao JWei RCao QZheng YZhu ZJi CZhou X(2024)FedStar: Efficient Federated Learning on Heterogeneous Communication NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334627443:6(1848-1861)Online publication date: Jun-2024
https://doi.org/10.1109/TCAD.2023.3346274
Yang ZJi WGuo QWang ZEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)JAVP: Joint-Aware Video Processing with Edge-Cloud Collaboration for DNN InferenceProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613914(9152-9160)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3613914
Liu WGeng JZhu ZZhao YJi CLi CLian ZZhou X(2023)Ace-Sniper: Cloud–Edge Collaborative Scheduling Framework With DNN Inference Latency Modeling on Heterogeneous DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331438843:2(534-547)Online publication date: 12-Sep-2023
https://dl.acm.org/doi/10.1109/TCAD.2023.3314388
Wang LLu KZhang NQu XWang JWan JLi GXiao J(2023)Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247821(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247821
Ren WQu YDong CJing YSun HWu QGuo S(2023)A Survey on Collaborative DNN Inference for Edge IntelligenceMachine Intelligence Research10.1007/s11633-022-1391-720:3(370-395)Online publication date: 3-May-2023
https://doi.org/10.1007/s11633-022-1391-7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten