Abstract
Recently, deep neural networks (DNNs) have been applied to most intelligent applications and deployed on different kinds of devices. However, DNN inference is resource-intensive. Especially, in edge computing, DNN inference demands to face the constrained computing resource of end devices and excessive data transmission costs when offloading raw data to the edge server. A better solution is DNN partitioning, which splits the DNN into two parts, one running on end devices and the other on the edge server. However, one edge server often needs to provide services for multiple end devices simultaneously, which may cause excessive queueing delay. To meet the latency requirements of real-time DNN tasks, we combine the early-exit mechanism and DNN partitioning. We formally define the DNN inference with partitioning and early-exit as an optimization problem. To solve the problem, we propose two efficient algorithms to determine the partition points of DNN partitioning and the thresholds of the early-exit mechanism. We conduct extensive simulations on our proposed algorithms, and the results show that they can dramatically accelerate DNN inference while achieving high accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
Marjani, M., et al.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017)
Jain, D.K., Jacob, S., Alzubi, J., Menon, V.: An efficient and adaptable multimedia system for converting PAL to VGA in real-time video processing. J. Real-Time Image Process. 17(6), 2113–2125 (2019). https://doi.org/10.1007/s11554-019-00889-4
Yin, K.: Cloud computing: concept, model, and key technologies. ZTE Commun. 8(4), 21–26 (2020)
Kang, Y., et al.: Neurosurgeon: collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Comput. Archit. News 45(1), 615–629 (2017)
Mao, Y., You, C., Zhang, J., Huang, K., Letaief, K.B.: A survey on mobile edge computing: the communication perspective. IEEE Commun. Surv. Tutor. 19(4), 2322–2358 (2017)
Drolia, U., Guo, K., Tan, J., Gandhi, R., Narasimhan, P.: Cachier: edge-caching for recognition applications. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 276–286. IEEE (2017)
Chen, W., Liu, B., Huang, H., Guo, S., Zheng, Z.: When UAV swarm meets edge-cloud computing: the QoS perspective. IEEE Netw. 33(2), 36–43 (2019)
Li, E., Zeng, L., Zhou, Z., Chen, X.: Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19(1), 447–457 (2019)
Hu, C., Bao, W., Wang, D., Liu, F.: Dynamic adaptive DNN surgery for inference acceleration on the edge. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 1423–1431. IEEE (2019)
Teerapittayanon, S., McDanel, B., Kung, H.T.: BranchyNet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469. IEEE (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Panda, P., Sengupta, A., Roy, K.: Conditional deep learning for energy-efficient and enhanced pattern recognition. In: 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 475–480. IEEE (2016)
Pisner, D.A., Schnyer, D.M.: Support vector machine. In: Machine Learning, pp. 101–121. Elsevier (2020)
Shafiee, M.S., Shafiee, M.J., Wong, A.: Dynamic representations toward efficient inference on deep neural networks by decision gates. In: CVPR Workshops, pp. 667–675 (2019)
Altman, E., Ayesta, U., Prabhu, B.J.: Load balancing in processor sharing systems. Telecommun. Syst. 47(1), 35–48 (2011). https://doi.org/10.1007/s11235-010-9300-8
Gow, R., Rabhi, F.A., Venugopal, S.: Anomaly detection in complex real world application systems. IEEE Trans. Netw. Serv. Manage. 15(1), 83–96 (2017)
Stuckey, P.J., Guns, T., Bailey, J., Leckie, C., Ramamohanarao, K., Chan, J., et al.: Dynamic programming for predict+ optimise. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1444–1451 (2020)
Diamond, S., Boyd, S.: CVXPY: a python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(1), 2909–2913 (2016)
Alom, M.Z., et al.: The history began from AlexNet: a comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164 (2018)
Ayi, M., El-Sharkawy, M.: RMNv2: Reduced MobileNet v2 for CIFAR10. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0287–0292. IEEE (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, C., Xu, H., Xu, Y., Wang, Z., Huang, L. (2021). DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing. In: Liu, Z., Wu, F., Das, S.K. (eds) Wireless Algorithms, Systems, and Applications. WASA 2021. Lecture Notes in Computer Science(), vol 12937. Springer, Cham. https://doi.org/10.1007/978-3-030-85928-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-85928-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85927-5
Online ISBN: 978-3-030-85928-2
eBook Packages: Computer ScienceComputer Science (R0)