Skip to main content
Log in

On designing the adaptive computation framework of distributed deep learning models for Internet-of-Things applications

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Deep learning methods have been gradually adopted in the Internet-of-Things (IoT) applications. Nevertheless, the large demands for the computation and memory resources make these methods difficult to be deployed in the field as the resource-constrained IoT devices would be overwhelmed by the computations incurred by the inference operations of the deployed deep learning models. In this article, we propose the adaptive computation framework, which is built on top of the distributed deep neural networks (DDNNs), to facilitate the inference computations of a trained DDNN model to be collaboratively executed by the machines in a distributed computing hierarchy, e.g., the end device and the cloud server. By facilitating the trained models to be run on the actual distributed systems, the proposed framework enables the co-design of the distributed deep learning models and systems, since the delivered performance of the models on the systems, in terms of the inference time, consumed energy and model accuracy, is able to be measured and is served as the input for the next cycle for the model/system design. We have built the surveillance system for the object detection application with the prototyped framework. We use the surveillance system as a case study to demonstrate the capabilities of the proposed framework. In addition, the design considerations of developing the DDNN system are shared. With the promising results presented in the article, we believe that the framework paves a road for the automated design process of the distributed deep learning models/systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The compatibility of the proposed framework with the existing DNN frameworks is discussed in Sect. 4.1.

  2. The messages exchanged between the device and the server can be categorized into two groups. First, messages sent from the server to the device are 1) the program executable (of generated C code) of the trained models which defines the DNN network structure and trained parameters, and 2) the final predictions, if the device asks the server to help perform the computations. Second, the messages sent from the device to the server are 1) the acceleration request, which contains the name of the DDNN model to be accelerated and the intermediate data for further computations.

  3. To shorten the latency of each request, the subscription could be performed earlier during system initialization.

  4. From the related study [31], where several MQTT implementations were benchmarks, the MQTT servers were able to sustain several thousands of publishing data (connections) on the machine with two CPU cores and 4GB RAM over the gigabit network. For some MQTT implementation, the CPU utilization kept steady at 50% when handling several thousands of connections.

  5. \(T\_Total \approx T\_Device + T\_Server + T\_Comm.\), where each item is the averaged time of the 10,000 samples.

  6. When the server loading is low (by given a small threshold value), the maximum number of end devices served by the server is able to increase. When the loading is high, the server has difficulty serving more end devices.

  7. The threshold values are set to 0.4 and 0.8 since we want to present the impact of the entropy threshold values on the local exit rates. Different threshold values could be applied depending on the system design considerations.

  8. The Device&Server configuration is adopted in the experiments.

  9. We adopted pocl (Portable Computing Language) as the OpenCL runtime for the multicore processor, and pocl uses Pthread (POSIX Threads) to implement the OpenCL standard.

References

  1. Amroun H, Mhamed Hamy T, Ammi M (2017) DNN-based approach for identification of the level of attention of the TV-viewers using IoT network. In: Proceedings of the IEEE Conference on SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation, pp 1–4

  2. Aziz B (2014) A formal model and analysis of the MQ Telemetry Transport protocol. In: Proceedings of the 9th International Conference on Availability, Reliability and Security, pp 59–68

  3. Bang S, Wang J, Li Z, Gao C, Kim Y, Dong Q, Chen YP, Fick L, Sun X, Dreslinski R, et al (2017) 14.7 a 288 μw programmable deep-learning processor with 270kb on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In: Proceedings of the IEEE Conference on International Solid-State Circuits, pp 250–251

  4. Bort J (2016) The Google Brain is a real thing but very few people have seen it. https://www.businessinsider.com/what-is-google-brain-2016-9

  5. Cheng MH, Sun Q, Tu CH (2018) An adaptive computation framework of distributed deep learning models for internet-of-things applications. In: Proceedings of the 24th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp 85–91

  6. Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 3642–3649

  7. Courbariaux M, Bengio Y, David JP (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp 3123–3131

  8. Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.02830

  9. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 248–255

  10. Fernando N, Loke SW, Rahayu W (2013) Mobile cloud computing: a survey. Future Gen Comput Syst 29(1):84–106

    Article  Google Scholar 

  11. Google LLC (2018) Vision api - image content analysis, cloud vision api, google cloud. https://cloud.google.com/vision/

  12. Hadidi R, Cao J, Woodward M, Ryoo MS, Kim H (2018) Musical chair: efficient real-time recognition using collaborative iot devices. CoRR abs/1802.02138

  13. Hong S, Park Y (2017) A FPGA-based neural accelerator for small IoT devices. In: Proceedings of the 2017 International SoC Design Conference, pp 294–295

  14. Hu D, Krishnamachari B (2020) Fast and accurate streaming CNN inference via communication compression on the edge. In: Proceedings of the Fifth IEEE/ACM International Conference on Internet-of-Things Design and Implementation (IoTDI), IEEE, pp 157–163

  15. Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Proceedings of the 29th International Conference on Neural Information Processing Systems, pp 4107–4115

  16. Hung S, Tzeng T, Wu J, Tsai M, Lu Y, Shieh J, Tu C, Ho W (2014) MobileFBP: designing portable reconfigurable applications for heterogeneous systems. J Syste Arch Embed Syst Des 60(1):40–51. https://doi.org/10.1016/j.sysarc.2013.11.009

    Article  Google Scholar 

  17. Iandola FN, Moskewicz MW, Ashraf K, Han S, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size. CoRR abs/1602.07360

  18. Jindal V (2016) Mobilesoft: U: a deep learning framework to monitor heart rate during intensive physical exercise. https://src.acm.org/binaries/content/assets/src/2016/vasujindal.pdf

  19. Jouppi N (2016) https://cloudplatform.googleblog.com/2016/05/google-supercharges-machine-learning-tasks-with-custom-chip.html. https://src.acm.org/binaries/content/assets/src/2016/vasujindal.pdf

  20. Kang Y, Hauswald J, Gao C, Rovinski A, Mudge T, Mars J, Tang L (2017) Neurosurgeon: collaborative intelligence between the cloud and mobile edge. ACM SIGPLAN Not 52(4):615–629

    Article  Google Scholar 

  21. Kim YD, Park E, Yoo S, Choi T, Yang L, Shin D (2015) Compression of deep convolutional neural networks for fast and low power mobile applications. CoRR abs/1511.06530

  22. Krizhevsky A, Nair V, Hinton G (2014) The CIFAR-10 dataset. http://www.cs.toronto.edu/kriz/cifar.html

  23. Kumar Karthik, Lu YH (2010) Cloud computing for mobile users: can offloading computation save energy? Computer 43(4):51–56

    Article  Google Scholar 

  24. Lee CL, Hsu MY, Lu BS, Hung MY, Lee JK (2020) Experiment and enabled flow for GPGPU-sim simulators with fixed-point instructions. J Syst Arch 111:101783

    Article  Google Scholar 

  25. Mao J, Chen X, Nixon KW, Krieger C, Chen Y (2017) Modnn: local distributed mobile computing system for deep neural network. In: Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, pp 1396–1401

  26. Matsubara Y, Levorato M, Restuccia F (2021) Split computing and early exiting for deep learning applications: Survey and research challenges. CoRR abs/2103.04505

  27. McDanel B, Teerapittayanon S, Kung HT (2017) Embedded binarized neural networks. In: Proceedings of the International Conference on Embedded Wireless Systems and Networks, pp 168–173

  28. Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M (2018) Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor 20(4):2923–2960

    Article  Google Scholar 

  29. Nomi T (2018) Tiny-DNN documentation. https://media.readthedocs.org/pdf/tiny-dnn/latest/tiny-dnn.pdf

  30. Satyanarayanan M (2017) The emergence of edge computing. Computer 50(1):30–39

    Article  Google Scholar 

  31. Scalagent Distributed Technologies (2015) Benchmark of MQTT servers. http://www.scalagent.com/IMG/pdf/Benchmark_MQTT_servers-v1-1.pdf

  32. Scardapane S, Scarpiniti M, Baccarelli E, Uncini A (2020) Why should we add early exits to neural networks? Cogn Comput 12(5):954–966

    Article  Google Scholar 

  33. Tama BA, Rhee KH (2017) Attack classification analysis of IoT network via deep learning approach. Research Briefs on Information & Communication Technology Evolution

  34. Teerapittayanon S, McDanel B, Kung H (2016) Branchynet: fast inference via early exiting from deep neural networks. In: Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), pp 2464–2469

  35. Teerapittayanon S, McDanel B, Kung HT (2017) Distributed deep neural networks over the cloud, the edge and end devices. In: Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, pp 328–339. https://doi.org/10.1109/ICDCS.2017.226

  36. Truex S, Baracaldo N, Anwar A, Steinke T, Ludwig H, Zhang R, Zhou Y (2019) A hybrid approach to privacy-preserving federated learning. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pp 1–11

  37. Tseng T, Hung S, Tu C (2015) Migratom.js: a Javascript migration framework for distributed web computing and mobile devices. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 798–801. https://doi.org/10.1145/2695664.2695987

  38. Wang H, Yurochkin M, Sun Y, Papailiopoulos D, Khazaeni Y (2020) Federated learning with matched averaging. CoRR abs/2002.06440

  39. Wang Y, Li H, Li X (2016) Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices. In: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp 1–6

  40. Xiao L, Wan X, Lu X, Zhang Y, Wu D (2018) IoT security techniques based on machine learning: how do IoT devices use AI to enhance security? IEEE Signal Process Mag 35(5):41–49

    Article  Google Scholar 

  41. Xu Z, Cheung RC (2020) Binary convolutional neural network acceleration framework for rapid system prototyping. J Syst Arch 109:101762

    Article  Google Scholar 

  42. Yang K, Ou S, Chen HH (2008) On effective offloading services for resource-constrained mobile devices running heavier mobile internet applications. IEEE Commun Mag 46(1):56–63

    Article  Google Scholar 

  43. Yi S, Hao Z, Qin Z, Li Q (2015) Fog computing: platform and applications. In: Proceedings of the IEEE Workshop on Hot Topics in Web Systems and Technologies, pp 73–78

  44. Zhang S, Zhang S, Qian Z, Wu J, Jin Y, Lu S (2021) Deepslicing: collaborative and adaptive CNN inference with low latency. In: Proceedings of the IEEE Transactions on Parallel and Distributed Systems

  45. Zhang X, Ramachandran A, Zhuge C, He D, Zuo W, Cheng Z, Rupnow K, Chen D (2017) Machine learning on FPGAS to face the IoT revolution. In: Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, pp 819–826

Download references

Acknowledgements

This work is financially supported in part by the Ministry of Science and Technology of Taiwan under the Grant Number 107-2221-E-006-045-MY3. This work is also financially supported by the Intelligent Manufacturing Research Center (iMRC) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to QiHui Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tu, CH., Sun, Q. & Cheng, MH. On designing the adaptive computation framework of distributed deep learning models for Internet-of-Things applications. J Supercomput 77, 13191–13223 (2021). https://doi.org/10.1007/s11227-021-03795-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03795-4

Keywords

Navigation