DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

Li, Chao; Xu, Hongli; Xu, Yang; Wang, Zhiyuan; Huang, Liusheng

doi:10.1007/978-3-030-85928-2_37

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

Chao Li¹¹,
Hongli Xu¹²,
Yang Xu¹²,
Zhiyuan Wang¹¹ &
…
Liusheng Huang¹²

Conference paper
First Online: 02 September 2021

2147 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12937))

Abstract

Recently, deep neural networks (DNNs) have been applied to most intelligent applications and deployed on different kinds of devices. However, DNN inference is resource-intensive. Especially, in edge computing, DNN inference demands to face the constrained computing resource of end devices and excessive data transmission costs when offloading raw data to the edge server. A better solution is DNN partitioning, which splits the DNN into two parts, one running on end devices and the other on the edge server. However, one edge server often needs to provide services for multiple end devices simultaneously, which may cause excessive queueing delay. To meet the latency requirements of real-time DNN tasks, we combine the early-exit mechanism and DNN partitioning. We formally define the DNN inference with partitioning and early-exit as an optimization problem. To solve the problem, we propose two efficient algorithms to determine the partition points of DNN partitioning and the thresholds of the early-exit mechanism. We conduct extensive simulations on our proposed algorithms, and the results show that they can dramatically accelerate DNN inference while achieving high accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
Article Google Scholar
Marjani, M., et al.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017)
Article Google Scholar
Jain, D.K., Jacob, S., Alzubi, J., Menon, V.: An efficient and adaptable multimedia system for converting PAL to VGA in real-time video processing. J. Real-Time Image Process. 17(6), 2113–2125 (2019). https://doi.org/10.1007/s11554-019-00889-4
Article Google Scholar
Yin, K.: Cloud computing: concept, model, and key technologies. ZTE Commun. 8(4), 21–26 (2020)
Google Scholar
Kang, Y., et al.: Neurosurgeon: collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Comput. Archit. News 45(1), 615–629 (2017)
Article Google Scholar
Mao, Y., You, C., Zhang, J., Huang, K., Letaief, K.B.: A survey on mobile edge computing: the communication perspective. IEEE Commun. Surv. Tutor. 19(4), 2322–2358 (2017)
Article Google Scholar
Drolia, U., Guo, K., Tan, J., Gandhi, R., Narasimhan, P.: Cachier: edge-caching for recognition applications. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 276–286. IEEE (2017)
Google Scholar
Chen, W., Liu, B., Huang, H., Guo, S., Zheng, Z.: When UAV swarm meets edge-cloud computing: the QoS perspective. IEEE Netw. 33(2), 36–43 (2019)
Article Google Scholar
Li, E., Zeng, L., Zhou, Z., Chen, X.: Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19(1), 447–457 (2019)
Article Google Scholar
Hu, C., Bao, W., Wang, D., Liu, F.: Dynamic adaptive DNN surgery for inference acceleration on the edge. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 1423–1431. IEEE (2019)
Google Scholar
Teerapittayanon, S., McDanel, B., Kung, H.T.: BranchyNet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469. IEEE (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Panda, P., Sengupta, A., Roy, K.: Conditional deep learning for energy-efficient and enhanced pattern recognition. In: 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 475–480. IEEE (2016)
Google Scholar
Pisner, D.A., Schnyer, D.M.: Support vector machine. In: Machine Learning, pp. 101–121. Elsevier (2020)
Google Scholar
Shafiee, M.S., Shafiee, M.J., Wong, A.: Dynamic representations toward efficient inference on deep neural networks by decision gates. In: CVPR Workshops, pp. 667–675 (2019)
Google Scholar
Altman, E., Ayesta, U., Prabhu, B.J.: Load balancing in processor sharing systems. Telecommun. Syst. 47(1), 35–48 (2011). https://doi.org/10.1007/s11235-010-9300-8
Article Google Scholar
Gow, R., Rabhi, F.A., Venugopal, S.: Anomaly detection in complex real world application systems. IEEE Trans. Netw. Serv. Manage. 15(1), 83–96 (2017)
Article Google Scholar
Stuckey, P.J., Guns, T., Bailey, J., Leckie, C., Ramamohanarao, K., Chan, J., et al.: Dynamic programming for predict+ optimise. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1444–1451 (2020)
Google Scholar
Diamond, S., Boyd, S.: CVXPY: a python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(1), 2909–2913 (2016)
MathSciNet MATH Google Scholar
Alom, M.Z., et al.: The history began from AlexNet: a comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164 (2018)
Ayi, M., El-Sharkawy, M.: RMNv2: Reduced MobileNet v2 for CIFAR10. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0287–0292. IEEE (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China
Chao Li & Zhiyuan Wang
Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, 215123, China
Hongli Xu, Yang Xu & Liusheng Huang

Authors

Chao Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongli Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liusheng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Xu .

Editor information

Editors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, China
Zhe Liu
Shanghai Jiao Tong University, Shanghai, China
Fan Wu
Missouri University of Science and Technology, Rolla, MO, USA
Sajal K. Das

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, C., Xu, H., Xu, Y., Wang, Z., Huang, L. (2021). DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing. In: Liu, Z., Wu, F., Das, S.K. (eds) Wireless Algorithms, Systems, and Applications. WASA 2021. Lecture Notes in Computer Science(), vol 12937. Springer, Cham. https://doi.org/10.1007/978-3-030-85928-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-85928-2_37
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85927-5
Online ISBN: 978-3-030-85928-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics