research-article

Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration

Authors:

Xiaoheng DengAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 23, Issue 1

Article No.: 16, Pages 1 - 25

https://doi.org/10.1145/3634704

Published: 19 January 2024 Publication History

Abstract

Edge intelligence has emerged as a promising paradigm to accelerate DNN inference by model partitioning, which is particularly useful for intelligent scenarios that demand high accuracy and low latency. However, the dynamic nature of the edge environment and the diversity of end devices pose a significant challenge for DNN model partitioning strategies. Meanwhile, limited resources of the edge server make it difficult to manage resource allocation efficiently among multiple devices. In addition, most of the existing studies disregard the different service requirements of the DNN inference tasks, such as its high accuracy-sensitive or high latency-sensitive. To address these challenges, we propose a Multi-Compression Scale DNN Inference Acceleration (MCIA) based on cloud-edge-end collaboration. We model this problem as a mixed-integer multi-dimensional optimization problem, jointly optimizing the DNN model version choice, the partitioning choice, and the allocation of computational and bandwidth resources to maximize the tradeoff between inference accuracy and latency depending on the property of the tasks. Initially, we train multiple versions of DNN inference models with different compression scales in the cloud, and deploy them to end devices and edge server. Next, a deep reinforcement learning-based algorithm is developed for joint decision making of adaptive collaborative inference and resource allocation based on the current multi-compression scale models and the task property. Experimental results show that MCIA can adapt to heterogeneous devices and dynamic networks, and has superior performance compared with other methods.

References

[1]

[Online]. 2016. The GFLOPS/W of the various machines in the VMW Research Group. https://web.eece.maine.edu/vweaver/group/green_machines.html

[2]

Wenchao Chen, Guanqun Shen, Kaikai Chi, Shubin Zhang, and Xiaolong Chen. 2022. DRL based partial offloading for maximizing sum computation rate of FDMA-based wireless powered mobile edge computing. Computer Networks 214 (2022), 109158.

Digital Library

[3]

Xu Chen, Qian Shi, Lei Yang, and Jie Xu. 2018. ThriftyEdge: Resource-efficient edge computing for intelligent IoT applications. IEEE Network 32, 1 (2018), 61–65.

[4]

Xiaoheng Deng, Jian Yin, Peiyuan Guan, Neal N. Xiong, Lan Zhang, and Shahid Mumtaz. 2021. Intelligent delay-aware partial computing task offloading for multi-user industrial Internet of Things through edge computing. IEEE Internet of Things Journal (2021).

[5]

Xiaoheng Deng, Jingjing Zhang, Honggang Zhang, and Ping Jiang. 2022. Deep reinforcement learning-based resource allocation for cloud gaming via edge computing. IEEE Internet of Things Journal (2022).

[6]

Swarnava Dey, Jayeeta Mondal, and Arijit Mukherjee. 2019. Offloaded execution of deep learning inference at edge: Challenges and insights. In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 855–861.

[7]

Chongwu Dong, Sheng Hu, Xi Chen, and Wushao Wen. 2021. Joint optimization with DNN partitioning and resource allocation in mobile edge computing. IEEE Transactions on Network and Service Management 18, 4 (2021), 3973–3986.

[8]

Fang Dong, Huitian Wang, Dian Shen, Zhaowu Huang, Qiang He, Jinghui Zhang, Liangsheng Wen, and Tingting Zhang. 2022. Multi-exit DNN inference acceleration based on multi-dimensional optimization for edge intelligence. IEEE Transactions on Mobile Computing (2022).

Digital Library

[9]

Amir Erfan Eshratifar, Mohammad Saeed Abrishami, and Massoud Pedram. 2019. JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Transactions on Mobile Computing 20, 2 (2019), 565–576.

Digital Library

[10]

Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. BottleNet: A deep learning architecture for intelligent mobile cloud computing services. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 1–6.

[11]

Mingjin Gao, Rujing Shen, Long Shi, Wen Qi, Jun Li, and Yonghui Li. 2021. Task partitioning and offloading in DNN-task enabled mobile edge computing networks. IEEE Transactions on Mobile Computing (2021).

[12]

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).

[13]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems 28 (2015).

[14]

Zhiwei Hao, Guanyu Xu, Yong Luo, Han Hu, Jianping An, and Shiwen Mao. 2022. Multi-agent collaborative inference via DNN decoupling: Intermediate feature compression and edge learning. IEEE Transactions on Mobile Computing (2022).

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[16]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[17]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324.

[18]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[19]

Chuang Hu, Wei Bao, Dan Wang, and Fengming Liu. 2019. Dynamic adaptive DNN surgery for inference acceleration on the edge. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1423–1431.

Digital Library

[20]

Yutao Huang, Feng Wang, Fangxin Wang, and Jiangchuan Liu. 2019. DeePar: A hybrid device-edge-cloud execution framework for mobile deep learning applications. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 892–897.

[21]

Nafis Irtija, Iraklis Anagnostopoulos, Georgios Zervakis, Eirini Eleni Tsiropoulou, Hussam Amrouch, and Jörg Henkel. 2022. Energy efficient edge computing enabled by satisfaction games and approximate computing. IEEE Transactions on Green Communications and Networking 6, 1 (2022), 281–294.

[22]

Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615–629.

Digital Library

[23]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[24]

En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications 19, 1 (2019), 447–457.

[25]

Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, and Rongrong Ji. 2021. Towards compact CNNs via collaborative compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6438–6447.

[26]

Neiwen Ling, Xuan Huang, Zhihe Zhao, Nan Guan, Zhenyu Yan, and Guoliang Xing. 2023. BlastNet: Exploiting duo-blocks for cross-processor real-time DNN inference. ACM Transactions on Embedded Computing Systems, New York, NY, USA.

Digital Library

[27]

Guozhi Liu, Fei Dai, Xiaolong Xu, Xiaodong Fu, Wanchun Dou, Neeraj Kumar, and Muhammad Bilal. 2023. An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing. Future Generation Computer Systems 140 (2023), 422–435.

Digital Library

[28]

Su Liu, Jiong Yu, Xiaoheng Deng, and Shaohua Wan. 2022. FedCPF: An efficient-communication federated learning approach for vehicular edge computing in 6G communication networks. IEEE Transactions on Intelligent Transportation Systems 23, 2 (2022), 1616–1629.

Digital Library

[29]

Javier Mendez, Kay Bierzynski, M.P. Cuéllar, and Diego Morales. 2021. Edge intelligence: Concepts, architectures, applications and future directions. ACM Transactions on Embedded Computing Systems (TECS) (2021).

[30]

Arijit Mukherjee and Swarnava Dey. 2022. Automated deep learning model partitioning for heterogeneous edge devices. In Proceedings of the Second International Conference on AI-ML Systems. 1–8.

Digital Library

[31]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.

[32]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.

[33]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[34]

Surya T. Tokdar and Robert E. Kass. 2010. Importance sampling: A review. Wiley Interdisciplinary Reviews: Computational Statistics 2, 1 (2010), 54–60.

Digital Library

[35]

Leilei Wang, Xiaoheng Deng, Jinsong Gui, Xuechen Chen, and Shaohua Wan. 2023. Microservice-oriented service placement for mobile edge computing in sustainable internet of vehicles. IEEE Transactions on Intelligent Transportation Systems (2023), 1–15.

Digital Library

[36]

Leilei Wang, Xiaoheng Deng, Jinsong Gui, Honggang Zhang, and Shui Yu. 2023. Computation placement orchestrator for mobile edge computing in heterogeneous vehicular networks. IEEE Internet of Things Journal (2023), 1–1.

[37]

Shangguang Wang, Yan Guo, Ning Zhang, Peng Yang, Ao Zhou, and Xuemin Shen. 2019. Delay-aware microservice coordination in mobile edge computing: A reinforcement learning approach. IEEE Transactions on Mobile Computing 20, 3 (2019), 939–951.

[38]

Weishang Wu, Xiaoheng Deng, Ping Jiang, Shaohua Wan, and Yuanxiong Guo. 2023. CrossFuser: Multi-modal feature fusion for end-to-end autonomous driving under unseen weather conditions. IEEE Transactions on Intelligent Transportation Systems (2023).

Digital Library

[39]

Bo Yang, Xuelin Cao, Xiangfang Li, Qinqing Zhang, and Lijun Qian. 2019. Mobile-edge-computing-based hierarchical machine learning tasks distribution for IIoT. IEEE Internet of Things Journal 7, 3 (2019), 2169–2180.

[40]

Bo Yi, Xingwei Wang, Min Huang, Qiang He, and Fuliang Li. 2021. Computation migration oriented resource allocation in mobile social clouds. IEEE Transactions on Cloud Computing (2021).

[41]

Liekang Zeng, Xu Chen, Zhi Zhou, Lei Yang, and Junshan Zhang. 2020. CoEdge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices. IEEE/ACM Transactions on Networking 29, 2 (2020).

Digital Library

[42]

Weiting Zhang, Dong Yang, Haixia Peng, Wen Wu, Wei Quan, Hongke Zhang, and Xuemin Shen. 2021. Deep reinforcement learning based resource management for DNN inference in industrial IoT. IEEE Transactions on Vehicular Technology 70, 8 (2021), 7605–7618.

[43]

Liang Zhao, Enchao Zhang, Shaohua Wan, Ammar Hawbani, Ahmed Y. Al-Dubai, Geyong Min, and Albert Y. Zomaya. 2023. MESON: A mobility-aware dependent task offloading scheme for urban vehicular edge computing. IEEE Transactions on Mobile Computing (2023).

Cited By

Li CJiang KZhang YJiang LLuo YWan S(2024)Deep Reinforcement Learning-based Mining Task Offloading Scheme for Intelligent Connected Vehicles in UAV-aided MECACM Transactions on Design Automation of Electronic Systems10.1145/365345129:3(1-29)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.1145/3653451
Zhang HLiao KTai YMa WCao GSun WXu L(2024)Decentralized and Fault-Tolerant Task Offloading for Enabling Network Edge IntelligenceIEEE Systems Journal10.1109/JSYST.2024.340369618:2(1459-1470)Online publication date: Jun-2024
https://doi.org/10.1109/JSYST.2024.3403696
Huang ZSheng ZNasir AYu H(2024)Energy Efficiency Maximization for UAV-Assisted Full-Duplex Communication in the Presence of Multiple Malicious JammersIEEE Systems Journal10.1109/JSYST.2024.339055418:2(1257-1268)Online publication date: Jun-2024
https://doi.org/10.1109/JSYST.2024.3390554
Show More Cited By

Index Terms

Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Cooperation and coordination
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms
2. Networks
  1. Network algorithms
    1. Control path algorithms
      1. Network resources allocation

Recommendations

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing
Wireless Algorithms, Systems, and Applications
Abstract
Recently, deep neural networks (DNNs) have been applied to most intelligent applications and deployed on different kinds of devices. However, DNN inference is resource-intensive. Especially, in edge computing, DNN inference demands to face the ...
An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing
Abstract
Deep Neural Networks (DNNs) based on intelligent applications have been intensively deployed on mobile devices. Unfortunately, resource-constrained mobile devices cannot meet stringent latency requirements due to a large amount of ...
Highlights
- An adaptive DNN inference acceleration framework is proposed to reduce DNN inference latency in the end–edge–cloud computing environment.
BIRP: Batch-aware Inference Workload Redistribution and Parallel Scheme for Edge Collaboration
ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

The inference workload redistribution is a technique for evacuating inference requests from hot edges to idle edges in edge collaborative systems, thereby achieving inference workload balancing for inference on different edges. However, with the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 23, Issue 1

January 2024

406 pages

EISSN:1558-3465

DOI:10.1145/3613501

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 19 January 2024

Online AM: 28 November 2023

Accepted: 14 November 2023

Revised: 05 November 2023

Received: 05 June 2023

Published in TECS Volume 23, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Natural Science Foundation of Hunan Province
Opening Project of State Key Laboratory of Nickel and Cobalt Resources Comprehensive Utilization
Shenzhen Science and Technology Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
1,129
Total Downloads

Downloads (Last 12 months)950
Downloads (Last 6 weeks)105

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li CJiang KZhang YJiang LLuo YWan S(2024)Deep Reinforcement Learning-based Mining Task Offloading Scheme for Intelligent Connected Vehicles in UAV-aided MECACM Transactions on Design Automation of Electronic Systems10.1145/365345129:3(1-29)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.1145/3653451
Zhang HLiao KTai YMa WCao GSun WXu L(2024)Decentralized and Fault-Tolerant Task Offloading for Enabling Network Edge IntelligenceIEEE Systems Journal10.1109/JSYST.2024.340369618:2(1459-1470)Online publication date: Jun-2024
https://doi.org/10.1109/JSYST.2024.3403696
Huang ZSheng ZNasir AYu H(2024)Energy Efficiency Maximization for UAV-Assisted Full-Duplex Communication in the Presence of Multiple Malicious JammersIEEE Systems Journal10.1109/JSYST.2024.339055418:2(1257-1268)Online publication date: Jun-2024
https://doi.org/10.1109/JSYST.2024.3390554
Fu FWang YLi SYang LZhao RDai YYang ZZhang Z(2024)Incentive Mechanism Against Bounded Rationality for Federated Learning-Enabled Internet of UAVs: A Prospect Theory-Based ApproachIEEE Internet of Things Journal10.1109/JIOT.2024.338163611:12(20958-20969)Online publication date: 15-Jun-2024
https://doi.org/10.1109/JIOT.2024.3381636
Feng JSong WGui MZhang L(2024)Interference-Aware Computing and Network Integrated Inference Optimization2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS)10.1109/ICPICS62053.2024.10796453(1454-1461)Online publication date: 26-Jul-2024
https://doi.org/10.1109/ICPICS62053.2024.10796453
Kayal PLeon-Garcia A(2024)DNNSplit: Latency and Cost-Efficient Split Point Identification for Multi-Tier DNN PartitioningIEEE Access10.1109/ACCESS.2024.340905712(80047-80061)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3409057

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents