skip to main content
research-article

Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration

Published: 19 January 2024 Publication History

Abstract

Edge intelligence has emerged as a promising paradigm to accelerate DNN inference by model partitioning, which is particularly useful for intelligent scenarios that demand high accuracy and low latency. However, the dynamic nature of the edge environment and the diversity of end devices pose a significant challenge for DNN model partitioning strategies. Meanwhile, limited resources of the edge server make it difficult to manage resource allocation efficiently among multiple devices. In addition, most of the existing studies disregard the different service requirements of the DNN inference tasks, such as its high accuracy-sensitive or high latency-sensitive. To address these challenges, we propose a Multi-Compression Scale DNN Inference Acceleration (MCIA) based on cloud-edge-end collaboration. We model this problem as a mixed-integer multi-dimensional optimization problem, jointly optimizing the DNN model version choice, the partitioning choice, and the allocation of computational and bandwidth resources to maximize the tradeoff between inference accuracy and latency depending on the property of the tasks. Initially, we train multiple versions of DNN inference models with different compression scales in the cloud, and deploy them to end devices and edge server. Next, a deep reinforcement learning-based algorithm is developed for joint decision making of adaptive collaborative inference and resource allocation based on the current multi-compression scale models and the task property. Experimental results show that MCIA can adapt to heterogeneous devices and dynamic networks, and has superior performance compared with other methods.

References

[1]
[Online]. 2016. The GFLOPS/W of the various machines in the VMW Research Group. https://web.eece.maine.edu/vweaver/group/green_machines.html
[2]
Wenchao Chen, Guanqun Shen, Kaikai Chi, Shubin Zhang, and Xiaolong Chen. 2022. DRL based partial offloading for maximizing sum computation rate of FDMA-based wireless powered mobile edge computing. Computer Networks 214 (2022), 109158.
[3]
Xu Chen, Qian Shi, Lei Yang, and Jie Xu. 2018. ThriftyEdge: Resource-efficient edge computing for intelligent IoT applications. IEEE Network 32, 1 (2018), 61–65.
[4]
Xiaoheng Deng, Jian Yin, Peiyuan Guan, Neal N. Xiong, Lan Zhang, and Shahid Mumtaz. 2021. Intelligent delay-aware partial computing task offloading for multi-user industrial Internet of Things through edge computing. IEEE Internet of Things Journal (2021).
[5]
Xiaoheng Deng, Jingjing Zhang, Honggang Zhang, and Ping Jiang. 2022. Deep reinforcement learning-based resource allocation for cloud gaming via edge computing. IEEE Internet of Things Journal (2022).
[6]
Swarnava Dey, Jayeeta Mondal, and Arijit Mukherjee. 2019. Offloaded execution of deep learning inference at edge: Challenges and insights. In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 855–861.
[7]
Chongwu Dong, Sheng Hu, Xi Chen, and Wushao Wen. 2021. Joint optimization with DNN partitioning and resource allocation in mobile edge computing. IEEE Transactions on Network and Service Management 18, 4 (2021), 3973–3986.
[8]
Fang Dong, Huitian Wang, Dian Shen, Zhaowu Huang, Qiang He, Jinghui Zhang, Liangsheng Wen, and Tingting Zhang. 2022. Multi-exit DNN inference acceleration based on multi-dimensional optimization for edge intelligence. IEEE Transactions on Mobile Computing (2022).
[9]
Amir Erfan Eshratifar, Mohammad Saeed Abrishami, and Massoud Pedram. 2019. JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Transactions on Mobile Computing 20, 2 (2019), 565–576.
[10]
Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. BottleNet: A deep learning architecture for intelligent mobile cloud computing services. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 1–6.
[11]
Mingjin Gao, Rujing Shen, Long Shi, Wen Qi, Jun Li, and Yonghui Li. 2021. Task partitioning and offloading in DNN-task enabled mobile edge computing networks. IEEE Transactions on Mobile Computing (2021).
[12]
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).
[13]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems 28 (2015).
[14]
Zhiwei Hao, Guanyu Xu, Yong Luo, Han Hu, Jianping An, and Shiwen Mao. 2022. Multi-agent collaborative inference via DNN decoupling: Intermediate feature compression and edge learning. IEEE Transactions on Mobile Computing (2022).
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[16]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[17]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324.
[18]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[19]
Chuang Hu, Wei Bao, Dan Wang, and Fengming Liu. 2019. Dynamic adaptive DNN surgery for inference acceleration on the edge. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1423–1431.
[20]
Yutao Huang, Feng Wang, Fangxin Wang, and Jiangchuan Liu. 2019. DeePar: A hybrid device-edge-cloud execution framework for mobile deep learning applications. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 892–897.
[21]
Nafis Irtija, Iraklis Anagnostopoulos, Georgios Zervakis, Eirini Eleni Tsiropoulou, Hussam Amrouch, and Jörg Henkel. 2022. Energy efficient edge computing enabled by satisfaction games and approximate computing. IEEE Transactions on Green Communications and Networking 6, 1 (2022), 281–294.
[22]
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615–629.
[23]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[24]
En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications 19, 1 (2019), 447–457.
[25]
Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, and Rongrong Ji. 2021. Towards compact CNNs via collaborative compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6438–6447.
[26]
Neiwen Ling, Xuan Huang, Zhihe Zhao, Nan Guan, Zhenyu Yan, and Guoliang Xing. 2023. BlastNet: Exploiting duo-blocks for cross-processor real-time DNN inference. ACM Transactions on Embedded Computing Systems, New York, NY, USA.
[27]
Guozhi Liu, Fei Dai, Xiaolong Xu, Xiaodong Fu, Wanchun Dou, Neeraj Kumar, and Muhammad Bilal. 2023. An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing. Future Generation Computer Systems 140 (2023), 422–435.
[28]
Su Liu, Jiong Yu, Xiaoheng Deng, and Shaohua Wan. 2022. FedCPF: An efficient-communication federated learning approach for vehicular edge computing in 6G communication networks. IEEE Transactions on Intelligent Transportation Systems 23, 2 (2022), 1616–1629.
[29]
Javier Mendez, Kay Bierzynski, M.P. Cuéllar, and Diego Morales. 2021. Edge intelligence: Concepts, architectures, applications and future directions. ACM Transactions on Embedded Computing Systems (TECS) (2021).
[30]
Arijit Mukherjee and Swarnava Dey. 2022. Automated deep learning model partitioning for heterogeneous edge devices. In Proceedings of the Second International Conference on AI-ML Systems. 1–8.
[31]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.
[32]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.
[33]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[34]
Surya T. Tokdar and Robert E. Kass. 2010. Importance sampling: A review. Wiley Interdisciplinary Reviews: Computational Statistics 2, 1 (2010), 54–60.
[35]
Leilei Wang, Xiaoheng Deng, Jinsong Gui, Xuechen Chen, and Shaohua Wan. 2023. Microservice-oriented service placement for mobile edge computing in sustainable internet of vehicles. IEEE Transactions on Intelligent Transportation Systems (2023), 1–15.
[36]
Leilei Wang, Xiaoheng Deng, Jinsong Gui, Honggang Zhang, and Shui Yu. 2023. Computation placement orchestrator for mobile edge computing in heterogeneous vehicular networks. IEEE Internet of Things Journal (2023), 1–1.
[37]
Shangguang Wang, Yan Guo, Ning Zhang, Peng Yang, Ao Zhou, and Xuemin Shen. 2019. Delay-aware microservice coordination in mobile edge computing: A reinforcement learning approach. IEEE Transactions on Mobile Computing 20, 3 (2019), 939–951.
[38]
Weishang Wu, Xiaoheng Deng, Ping Jiang, Shaohua Wan, and Yuanxiong Guo. 2023. CrossFuser: Multi-modal feature fusion for end-to-end autonomous driving under unseen weather conditions. IEEE Transactions on Intelligent Transportation Systems (2023).
[39]
Bo Yang, Xuelin Cao, Xiangfang Li, Qinqing Zhang, and Lijun Qian. 2019. Mobile-edge-computing-based hierarchical machine learning tasks distribution for IIoT. IEEE Internet of Things Journal 7, 3 (2019), 2169–2180.
[40]
Bo Yi, Xingwei Wang, Min Huang, Qiang He, and Fuliang Li. 2021. Computation migration oriented resource allocation in mobile social clouds. IEEE Transactions on Cloud Computing (2021).
[41]
Liekang Zeng, Xu Chen, Zhi Zhou, Lei Yang, and Junshan Zhang. 2020. CoEdge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices. IEEE/ACM Transactions on Networking 29, 2 (2020).
[42]
Weiting Zhang, Dong Yang, Haixia Peng, Wen Wu, Wei Quan, Hongke Zhang, and Xuemin Shen. 2021. Deep reinforcement learning based resource management for DNN inference in industrial IoT. IEEE Transactions on Vehicular Technology 70, 8 (2021), 7605–7618.
[43]
Liang Zhao, Enchao Zhang, Shaohua Wan, Ammar Hawbani, Ahmed Y. Al-Dubai, Geyong Min, and Albert Y. Zomaya. 2023. MESON: A mobility-aware dependent task offloading scheme for urban vehicular edge computing. IEEE Transactions on Mobile Computing (2023).

Cited By

View all
  • (2024)Deep Reinforcement Learning-based Mining Task Offloading Scheme for Intelligent Connected Vehicles in UAV-aided MECACM Transactions on Design Automation of Electronic Systems10.1145/365345129:3(1-29)Online publication date: 3-May-2024
  • (2024)Decentralized and Fault-Tolerant Task Offloading for Enabling Network Edge IntelligenceIEEE Systems Journal10.1109/JSYST.2024.340369618:2(1459-1470)Online publication date: Jun-2024
  • (2024)Energy Efficiency Maximization for UAV-Assisted Full-Duplex Communication in the Presence of Multiple Malicious JammersIEEE Systems Journal10.1109/JSYST.2024.339055418:2(1257-1268)Online publication date: Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 23, Issue 1
January 2024
406 pages
EISSN:1558-3465
DOI:10.1145/3613501
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 19 January 2024
Online AM: 28 November 2023
Accepted: 14 November 2023
Revised: 05 November 2023
Received: 05 June 2023
Published in TECS Volume 23, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Edge intelligence
  2. DNN inference
  3. model partitioning
  4. multi-compression scale
  5. deep reinforcement learning

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • National Natural Science Foundation of Hunan Province
  • Opening Project of State Key Laboratory of Nickel and Cobalt Resources Comprehensive Utilization
  • Shenzhen Science and Technology Program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)950
  • Downloads (Last 6 weeks)105
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Deep Reinforcement Learning-based Mining Task Offloading Scheme for Intelligent Connected Vehicles in UAV-aided MECACM Transactions on Design Automation of Electronic Systems10.1145/365345129:3(1-29)Online publication date: 3-May-2024
  • (2024)Decentralized and Fault-Tolerant Task Offloading for Enabling Network Edge IntelligenceIEEE Systems Journal10.1109/JSYST.2024.340369618:2(1459-1470)Online publication date: Jun-2024
  • (2024)Energy Efficiency Maximization for UAV-Assisted Full-Duplex Communication in the Presence of Multiple Malicious JammersIEEE Systems Journal10.1109/JSYST.2024.339055418:2(1257-1268)Online publication date: Jun-2024
  • (2024)Incentive Mechanism Against Bounded Rationality for Federated Learning-Enabled Internet of UAVs: A Prospect Theory-Based ApproachIEEE Internet of Things Journal10.1109/JIOT.2024.338163611:12(20958-20969)Online publication date: 15-Jun-2024
  • (2024)Interference-Aware Computing and Network Integrated Inference Optimization2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS)10.1109/ICPICS62053.2024.10796453(1454-1461)Online publication date: 26-Jul-2024
  • (2024)DNNSplit: Latency and Cost-Efficient Split Point Identification for Multi-Tier DNN PartitioningIEEE Access10.1109/ACCESS.2024.340905712(80047-80061)Online publication date: 2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media