Skip to main content

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12937))

Abstract

Recently, deep neural networks (DNNs) have been applied to most intelligent applications and deployed on different kinds of devices. However, DNN inference is resource-intensive. Especially, in edge computing, DNN inference demands to face the constrained computing resource of end devices and excessive data transmission costs when offloading raw data to the edge server. A better solution is DNN partitioning, which splits the DNN into two parts, one running on end devices and the other on the edge server. However, one edge server often needs to provide services for multiple end devices simultaneously, which may cause excessive queueing delay. To meet the latency requirements of real-time DNN tasks, we combine the early-exit mechanism and DNN partitioning. We formally define the DNN inference with partitioning and early-exit as an optimization problem. To solve the problem, we propose two efficient algorithms to determine the partition points of DNN partitioning and the thresholds of the early-exit mechanism. We conduct extensive simulations on our proposed algorithms, and the results show that they can dramatically accelerate DNN inference while achieving high accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)

    Article  Google Scholar 

  2. Marjani, M., et al.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017)

    Article  Google Scholar 

  3. Jain, D.K., Jacob, S., Alzubi, J., Menon, V.: An efficient and adaptable multimedia system for converting PAL to VGA in real-time video processing. J. Real-Time Image Process. 17(6), 2113–2125 (2019). https://doi.org/10.1007/s11554-019-00889-4

    Article  Google Scholar 

  4. Yin, K.: Cloud computing: concept, model, and key technologies. ZTE Commun. 8(4), 21–26 (2020)

    Google Scholar 

  5. Kang, Y., et al.: Neurosurgeon: collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Comput. Archit. News 45(1), 615–629 (2017)

    Article  Google Scholar 

  6. Mao, Y., You, C., Zhang, J., Huang, K., Letaief, K.B.: A survey on mobile edge computing: the communication perspective. IEEE Commun. Surv. Tutor. 19(4), 2322–2358 (2017)

    Article  Google Scholar 

  7. Drolia, U., Guo, K., Tan, J., Gandhi, R., Narasimhan, P.: Cachier: edge-caching for recognition applications. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 276–286. IEEE (2017)

    Google Scholar 

  8. Chen, W., Liu, B., Huang, H., Guo, S., Zheng, Z.: When UAV swarm meets edge-cloud computing: the QoS perspective. IEEE Netw. 33(2), 36–43 (2019)

    Article  Google Scholar 

  9. Li, E., Zeng, L., Zhou, Z., Chen, X.: Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19(1), 447–457 (2019)

    Article  Google Scholar 

  10. Hu, C., Bao, W., Wang, D., Liu, F.: Dynamic adaptive DNN surgery for inference acceleration on the edge. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 1423–1431. IEEE (2019)

    Google Scholar 

  11. Teerapittayanon, S., McDanel, B., Kung, H.T.: BranchyNet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469. IEEE (2016)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. Panda, P., Sengupta, A., Roy, K.: Conditional deep learning for energy-efficient and enhanced pattern recognition. In: 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 475–480. IEEE (2016)

    Google Scholar 

  14. Pisner, D.A., Schnyer, D.M.: Support vector machine. In: Machine Learning, pp. 101–121. Elsevier (2020)

    Google Scholar 

  15. Shafiee, M.S., Shafiee, M.J., Wong, A.: Dynamic representations toward efficient inference on deep neural networks by decision gates. In: CVPR Workshops, pp. 667–675 (2019)

    Google Scholar 

  16. Altman, E., Ayesta, U., Prabhu, B.J.: Load balancing in processor sharing systems. Telecommun. Syst. 47(1), 35–48 (2011). https://doi.org/10.1007/s11235-010-9300-8

    Article  Google Scholar 

  17. Gow, R., Rabhi, F.A., Venugopal, S.: Anomaly detection in complex real world application systems. IEEE Trans. Netw. Serv. Manage. 15(1), 83–96 (2017)

    Article  Google Scholar 

  18. Stuckey, P.J., Guns, T., Bailey, J., Leckie, C., Ramamohanarao, K., Chan, J., et al.: Dynamic programming for predict+ optimise. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1444–1451 (2020)

    Google Scholar 

  19. Diamond, S., Boyd, S.: CVXPY: a python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(1), 2909–2913 (2016)

    MathSciNet  MATH  Google Scholar 

  20. Alom, M.Z., et al.: The history began from AlexNet: a comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164 (2018)

  21. Ayi, M., El-Sharkawy, M.: RMNv2: Reduced MobileNet v2 for CIFAR10. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0287–0292. IEEE (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, C., Xu, H., Xu, Y., Wang, Z., Huang, L. (2021). DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing. In: Liu, Z., Wu, F., Das, S.K. (eds) Wireless Algorithms, Systems, and Applications. WASA 2021. Lecture Notes in Computer Science(), vol 12937. Springer, Cham. https://doi.org/10.1007/978-3-030-85928-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85928-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85927-5

  • Online ISBN: 978-3-030-85928-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics