Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices

Stahl, Rafael; Zhao, Zhuoran; Mueller-Gritschneder, Daniel; Gerstlauer, Andreas; Schlichtmann, Ulf

doi:10.1007/978-3-030-27562-4_6

Rafael Stahl¹¹,
Zhuoran Zhao¹²,
Daniel Mueller-Gritschneder¹¹,
Andreas Gerstlauer¹² &
…
Ulf Schlichtmann¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11733))

Included in the following conference series:

International Conference on Embedded Computer Systems

2325 Accesses
22 Citations

Abstract

Performing inference tasks of deep learning applications on IoT edge devices ensures privacy of input data and can result in shorter latency when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute a complete Deep Neural Network (DNN). One possible solution is to distribute the DNN across multiple edge devices. For a complete distribution, both fully-connected and feature- and weight-intensive convolutional layers need to be partitioned to reduce the amount of computation and data on each resource-constrained edge device. At the same time, resulting communication overheads need to be considered. Existing work on distributed DNN execution can not support all types of networks and layers or does not account for layer fusion opportunities to reduce communication. In this paper, we jointly optimize memory, computation and communication demands for distributed execution of complete neural networks covering all layers. This is achieved through techniques that combine both feature and weight partitioning with a communication-aware layer fusion approach to enable holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly such that the amount of data to be exchanged between devices is minimized to optimize run time.

Experimental results for a simulation of six edge devices on 100 Mbit connections running the YOLOv2 DNN model show that the schemes evenly balance the memory footprint between devices. The integration of layer fusion additionally leads to a reduction of communication demands by 14.8%. This results in run time speed-up of the inference task by 1.15x compared to partitioning without fusing.

This work is partly funded by National Science Foundation (NSF) grant NSF CNS-1421642 in the USA and the German ministry of education and research (BMBF) under grant number 01IS17028F as part of the ITEA3 project COMPACT with reference number 16018.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: MICRO (2016)
Google Scholar
Bhattacharya, S., Lane, N.D.: Sparsification and separation of deep learning layers for constrained resource inference on wearables. In: SenSys (2016)
Google Scholar
Kang, Y., et al.: Neurosurgeon: collaborative intelligence between the cloud and mobile edge. In: ASPLOS, pp. 615–629 (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Google Scholar
Mao, J., et al.: MoDnn: local distributed mobile computing system for deep neural network. In: DATE (2017)
Google Scholar
Motamedi, M., Fong, D., Ghiasi, S.: Fast and energy-efficient CNN inference on IoT devices. arXiv preprint arXiv:1611.07151 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Teerapittayanon, S., McDanel, B., Kung, H.: Distributed deep neural networks over the cloud, the edge and end devices. In: ICDCS (2017)
Google Scholar
Zhao, Z., Barijough, K.M., Gerstlauer, A.: DeepThings: distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE TCAD 37, 2348–2359 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Rafael Stahl, Daniel Mueller-Gritschneder & Ulf Schlichtmann
University of Texas at Austin, Austin, USA
Zhuoran Zhao & Andreas Gerstlauer

Authors

Rafael Stahl
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoran Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Mueller-Gritschneder
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Gerstlauer
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Schlichtmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael Stahl .

Editor information

Editors and Affiliations

Technical University of Crete and ICS - FORTH, Chania, Greece
Dionisios N. Pnevmatikatos
INSA Rennes, Rennes Cedex 7, France
Maxime Pelcat
Fraunhofer IESE, Kaiserslautern, Germany
Matthias Jung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stahl, R., Zhao, Z., Mueller-Gritschneder, D., Gerstlauer, A., Schlichtmann, U. (2019). Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-27562-4_6
Published: 08 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27561-7
Online ISBN: 978-3-030-27562-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics