A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system

Shi, Lei; Xu, Zhigang; Sun, Yabo; Shi, Yi; Fan, Yuqi; Ding, Xu

doi:10.1007/s12083-021-01223-1

A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system

Published: 16 August 2021

Volume 14, pages 4031–4045, (2021)
Cite this article

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Lei Shi¹,
Zhigang Xu ORCID: orcid.org/0000-0001-6593-0629¹,
Yabo Sun¹,
Yi Shi²,
Yuqi Fan¹ &
…
Xu Ding³

828 Accesses
9 Citations
Explore all metrics

Abstract

Edge intelligence, as a new computing paradigm, aims to allocate Artificial Intelligence (AI)-based tasks partly on the edge to execute for reducing latency, consuming energy and improving privacy. As the most important technique of AI, Deep Neural Networks (DNNs) have been widely used in various fields. And for those DNN based tasks, a new computing scheme named DNN model partition can further reduce the execution time. This computing scheme partitions the DNN task into two parts, one will be executed on the end devices and the other will be executed on edge servers. However, in a complex edge computing system, it is difficult to coordinate DNN model partition and task allocation. In this work, we study this problem in the heterogeneous edge computing system. We first establish the mathematical model of adaptive DNN model partition and task offloading. The mathematical model contains a large number of binary variables, and the solution space will be too large to be solved directly in a multi-task scenario. Then we use dynamic programming and greedy strategy to reduce the solution space under the premise of a good solution, and propose our offline algorithm named GSPI. Then considering the actual situation, we subsequently proposed the online algorithm. Through our experiments and simulations, we proved that compared with end-only and server-only, our proposed GSPI algorithm can reduce the system time cost by 30% on average and the online algorithm can reduce the system time cost by 28% on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A DNN Inference Acceleration Algorithm in Heterogeneous Edge Computing: Joint Task Allocation and Model Partition

A collaborative cloud-edge computing framework in distributed neural network

Article Open access 26 October 2020

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

References

Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In BMVC
Chen C, Seff A, Kornhauser AL, Xiao J (2015) Deepdriving: Learning affordance for direct perception in autonomous driving. 2015 IEEE International Conference on Computer Vision (ICCV) 2722–2730
Chan W, Jaitly N, Le QV, Vinyals O (2016) Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4960–4964
Snyder T, Byrd G (2017) The internet of everything. Computer 50(6):8–9
Article Google Scholar
Pandey P, Singh S, Singh S (2010) Cloud computing. In ICWET
Shi W, Cao J, Zhang Q, Li Y, Xu L (2016) Edge computing: Vision and challenges. IEEE Internet of Things J 3:637–646
Article Google Scholar
Long C, Cao Y, Jiang T, Zhang Q (2018) Edge computing framework for cooperative video processing in multimedia iot systems. IEEE Trans Multimedia 20:1126–1139
Article Google Scholar
Deschamps-Sonsino A (2018) Smarter homes. In Apress
Alba E, Chicano F, Luque G (2016) Smart cities. In Lect Notes Comput Sci
Zhou Z, Chen X, Li E, Zeng L, Luo K, Zhang J (2019) Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc IEEE 107:1738–1762
Article Google Scholar
Kang Y, Hauswald J, Gao C, Rovinski A, Mudge TN, Mars J, Tang L (2017) Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ASPLOS ’17
Han S, Mao H, Dally WJ (2015) Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. ArXiv, abs/1704.04861
Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV
Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 4510–4520
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 6848–6856
Xu M, Qian F, Zhu M, Huang F, Pushp S, Liu X (2020) Deepwear: Adaptive local offloading for on-wearable deep learning. IEEE Trans Mob Comput 19:314–330
Article Google Scholar
Li E, Zeng L, Zhou Z, Chen X (2020) Edge ai: On-demand accelerating deep neural network inference via edge computing. IEEE Trans Wirel Commun 19:447–457
Article Google Scholar
Teerapittayanon S, McDanel B, Kung HT (2016) Branchynet: Fast inference via early exiting from deep neural networks. 2016 23rd International Conference on Pattern Recognition (ICPR) 2464–2469
Hu C, Bao WS, Wang D, Liu F (2019) Dynamic adaptive dnn surgery for inference acceleration on the edge. IEEE INFOCOM 2019 - IEEE Conference on Computer Communications 1423–1431
Ko JH, Na T, Amir MF, Mukhopadhyay S (2018) Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1–6
Lin B, Huang Y, Zhang J, Hu J, Chen X, Li J (2020) Cost-driven off-loading for dnn-based applications over cloud, edge, and end devices. IEEE Trans Ind Inf 16:5456–5466
Article Google Scholar
Shi C, Chen L, Shen C, Song L, Xu J (2019) Privacy-aware edge computing based on adaptive dnn partitioning. 2019 IEEE Global Communications Conference (GLOBECOM) 1–6
Mao Y, Zhang J, Letaief KB (2016) Dynamic computation offloading for mobile-edge computing with energy harvesting devices. IEEE J Sel Areas Commun 34:3590–3605
Article Google Scholar
Tran TX, Pompili D (2019) Joint task offloading and resource allocation for multi-server mobile-edge computing networks. IEEE Trans Veh Technol 68:856–868
Article Google Scholar
Mohammed T, Joe-Wong C, Babbar R, Francesco MD (2020) Distributed inference acceleration with adaptive dnn partitioning and offloading. IEEE INFOCOM 2020 - IEEE Conference on Computer Communications 854–863
Huang Y, Wang F, Wang F, Liu J (2019) Deepar: A hybrid device-edge-cloud execution framework for mobile deep learning applications. IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) 892–897
Qassim H, Feinzimer D, Verma A (2017) Residual squeeze vgg16. ArXiv, abs/1705.03004
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1–9

Download references

Acknowledgements

This article was supported by the National Key Research And Development Plan(Grant No. 2018YFB2000505), National Natural Science Foundation of China (Grant No. 61806067) and Key Research and Development Project in Anhui Province(Grant No. 201904a06020024).

Author information

Authors and Affiliations

School of Computer Science and Information Engineering, Intelligent Interconnected Systems Laboratory of Anhui Procince, Hefei University of Technology, Hefei, 230009, China
Lei Shi, Zhigang Xu, Yabo Sun & Yuqi Fan
Dept. of ECE, Virginia Tech, Blacksburg, 24061, VA, USA
Yi Shi
Institute of Industry and Equipment Technology, Hefei University of Technology, Hefei, 230009, China
Xu Ding

Authors

Lei Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yabo Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yuqi Fan
View author publications
You can also search for this author in PubMed Google Scholar
Xu Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhigang Xu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, L., Xu, Z., Sun, Y. et al. A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system. Peer-to-Peer Netw. Appl. 14, 4031–4045 (2021). https://doi.org/10.1007/s12083-021-01223-1

Download citation

Received: 31 October 2020
Accepted: 08 July 2021
Published: 16 August 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s12083-021-01223-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system

Abstract

Access this article

Similar content being viewed by others

A DNN Inference Acceleration Algorithm in Heterogeneous Edge Computing: Joint Task Allocation and Model Partition

A collaborative cloud-edge computing framework in distributed neural network

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system

Abstract

Access this article

Similar content being viewed by others

A DNN Inference Acceleration Algorithm in Heterogeneous Edge Computing: Joint Task Allocation and Model Partition

A collaborative cloud-edge computing framework in distributed neural network

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation