Skip to main content

Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices

  • Conference paper
  • First Online:
Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11733))

Included in the following conference series:

Abstract

Performing inference tasks of deep learning applications on IoT edge devices ensures privacy of input data and can result in shorter latency when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute a complete Deep Neural Network (DNN). One possible solution is to distribute the DNN across multiple edge devices. For a complete distribution, both fully-connected and feature- and weight-intensive convolutional layers need to be partitioned to reduce the amount of computation and data on each resource-constrained edge device. At the same time, resulting communication overheads need to be considered. Existing work on distributed DNN execution can not support all types of networks and layers or does not account for layer fusion opportunities to reduce communication. In this paper, we jointly optimize memory, computation and communication demands for distributed execution of complete neural networks covering all layers. This is achieved through techniques that combine both feature and weight partitioning with a communication-aware layer fusion approach to enable holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly such that the amount of data to be exchanged between devices is minimized to optimize run time.

Experimental results for a simulation of six edge devices on 100 Mbit connections running the YOLOv2 DNN model show that the schemes evenly balance the memory footprint between devices. The integration of layer fusion additionally leads to a reduction of communication demands by 14.8%. This results in run time speed-up of the inference task by 1.15x compared to partitioning without fusing.

This work is partly funded by National Science Foundation (NSF) grant NSF CNS-1421642 in the USA and the German ministry of education and research (BMBF) under grant number 01IS17028F as part of the ITEA3 project COMPACT with reference number 16018.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: MICRO (2016)

    Google Scholar 

  2. Bhattacharya, S., Lane, N.D.: Sparsification and separation of deep learning layers for constrained resource inference on wearables. In: SenSys (2016)

    Google Scholar 

  3. Kang, Y., et al.: Neurosurgeon: collaborative intelligence between the cloud and mobile edge. In: ASPLOS, pp. 615–629 (2017)

    Google Scholar 

  4. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  5. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

    Google Scholar 

  6. Mao, J., et al.: MoDnn: local distributed mobile computing system for deep neural network. In: DATE (2017)

    Google Scholar 

  7. Motamedi, M., Fong, D., Ghiasi, S.: Fast and energy-efficient CNN inference on IoT devices. arXiv preprint arXiv:1611.07151 (2016)

  8. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)

  9. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  10. Teerapittayanon, S., McDanel, B., Kung, H.: Distributed deep neural networks over the cloud, the edge and end devices. In: ICDCS (2017)

    Google Scholar 

  11. Zhao, Z., Barijough, K.M., Gerstlauer, A.: DeepThings: distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE TCAD 37, 2348–2359 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Stahl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Stahl, R., Zhao, Z., Mueller-Gritschneder, D., Gerstlauer, A., Schlichtmann, U. (2019). Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27562-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27561-7

  • Online ISBN: 978-3-030-27562-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics