On designing the adaptive computation framework of distributed deep learning models for Internet-of-Things applications

Tu, Chia-Heng; Sun, QiHui; Cheng, Mu-Hsuan

doi:10.1007/s11227-021-03795-4

On designing the adaptive computation framework of distributed deep learning models for Internet-of-Things applications

Published: 21 April 2021

Volume 77, pages 13191–13223, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

398 Accesses
4 Citations
Explore all metrics

Abstract

Deep learning methods have been gradually adopted in the Internet-of-Things (IoT) applications. Nevertheless, the large demands for the computation and memory resources make these methods difficult to be deployed in the field as the resource-constrained IoT devices would be overwhelmed by the computations incurred by the inference operations of the deployed deep learning models. In this article, we propose the adaptive computation framework, which is built on top of the distributed deep neural networks (DDNNs), to facilitate the inference computations of a trained DDNN model to be collaboratively executed by the machines in a distributed computing hierarchy, e.g., the end device and the cloud server. By facilitating the trained models to be run on the actual distributed systems, the proposed framework enables the co-design of the distributed deep learning models and systems, since the delivered performance of the models on the systems, in terms of the inference time, consumed energy and model accuracy, is able to be measured and is served as the input for the next cycle for the model/system design. We have built the surveillance system for the object detection application with the prototyped framework. We use the surveillance system as a case study to demonstrate the capabilities of the proposed framework. In addition, the design considerations of developing the DDNN system are shared. With the promising results presented in the article, we believe that the framework paves a road for the automated design process of the distributed deep learning models/systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning: Algorithms, Real-World Applications and Research Directions

Article 22 March 2021

RETRACTED ARTICLE: A Review and State of Art of Internet of Things (IoT)

Article 14 July 2021

A review of object detection based on deep learning

Article 12 June 2020

Notes

The compatibility of the proposed framework with the existing DNN frameworks is discussed in Sect. 4.1.
The messages exchanged between the device and the server can be categorized into two groups. First, messages sent from the server to the device are 1) the program executable (of generated C code) of the trained models which defines the DNN network structure and trained parameters, and 2) the final predictions, if the device asks the server to help perform the computations. Second, the messages sent from the device to the server are 1) the acceleration request, which contains the name of the DDNN model to be accelerated and the intermediate data for further computations.
To shorten the latency of each request, the subscription could be performed earlier during system initialization.
From the related study [31], where several MQTT implementations were benchmarks, the MQTT servers were able to sustain several thousands of publishing data (connections) on the machine with two CPU cores and 4GB RAM over the gigabit network. For some MQTT implementation, the CPU utilization kept steady at 50% when handling several thousands of connections.
\(T\_Total \approx T\_Device + T\_Server + T\_Comm.\), where each item is the averaged time of the 10,000 samples.
When the server loading is low (by given a small threshold value), the maximum number of end devices served by the server is able to increase. When the loading is high, the server has difficulty serving more end devices.
The threshold values are set to 0.4 and 0.8 since we want to present the impact of the entropy threshold values on the local exit rates. Different threshold values could be applied depending on the system design considerations.
The Device&Server configuration is adopted in the experiments.
We adopted pocl (Portable Computing Language) as the OpenCL runtime for the multicore processor, and pocl uses Pthread (POSIX Threads) to implement the OpenCL standard.

References

Amroun H, Mhamed Hamy T, Ammi M (2017) DNN-based approach for identification of the level of attention of the TV-viewers using IoT network. In: Proceedings of the IEEE Conference on SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation, pp 1–4
Aziz B (2014) A formal model and analysis of the MQ Telemetry Transport protocol. In: Proceedings of the 9th International Conference on Availability, Reliability and Security, pp 59–68
Bang S, Wang J, Li Z, Gao C, Kim Y, Dong Q, Chen YP, Fick L, Sun X, Dreslinski R, et al (2017) 14.7 a 288 μw programmable deep-learning processor with 270kb on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In: Proceedings of the IEEE Conference on International Solid-State Circuits, pp 250–251
Bort J (2016) The Google Brain is a real thing but very few people have seen it. https://www.businessinsider.com/what-is-google-brain-2016-9
Cheng MH, Sun Q, Tu CH (2018) An adaptive computation framework of distributed deep learning models for internet-of-things applications. In: Proceedings of the 24th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp 85–91
Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 3642–3649
Courbariaux M, Bengio Y, David JP (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp 3123–3131
Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.02830
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 248–255
Fernando N, Loke SW, Rahayu W (2013) Mobile cloud computing: a survey. Future Gen Comput Syst 29(1):84–106
Article Google Scholar
Google LLC (2018) Vision api - image content analysis, cloud vision api, google cloud. https://cloud.google.com/vision/
Hadidi R, Cao J, Woodward M, Ryoo MS, Kim H (2018) Musical chair: efficient real-time recognition using collaborative iot devices. CoRR abs/1802.02138
Hong S, Park Y (2017) A FPGA-based neural accelerator for small IoT devices. In: Proceedings of the 2017 International SoC Design Conference, pp 294–295
Hu D, Krishnamachari B (2020) Fast and accurate streaming CNN inference via communication compression on the edge. In: Proceedings of the Fifth IEEE/ACM International Conference on Internet-of-Things Design and Implementation (IoTDI), IEEE, pp 157–163
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Proceedings of the 29th International Conference on Neural Information Processing Systems, pp 4107–4115
Hung S, Tzeng T, Wu J, Tsai M, Lu Y, Shieh J, Tu C, Ho W (2014) MobileFBP: designing portable reconfigurable applications for heterogeneous systems. J Syste Arch Embed Syst Des 60(1):40–51. https://doi.org/10.1016/j.sysarc.2013.11.009
Article Google Scholar
Iandola FN, Moskewicz MW, Ashraf K, Han S, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size. CoRR abs/1602.07360
Jindal V (2016) Mobilesoft: U: a deep learning framework to monitor heart rate during intensive physical exercise. https://src.acm.org/binaries/content/assets/src/2016/vasujindal.pdf
Jouppi N (2016) https://cloudplatform.googleblog.com/2016/05/google-supercharges-machine-learning-tasks-with-custom-chip.html. https://src.acm.org/binaries/content/assets/src/2016/vasujindal.pdf
Kang Y, Hauswald J, Gao C, Rovinski A, Mudge T, Mars J, Tang L (2017) Neurosurgeon: collaborative intelligence between the cloud and mobile edge. ACM SIGPLAN Not 52(4):615–629
Article Google Scholar
Kim YD, Park E, Yoo S, Choi T, Yang L, Shin D (2015) Compression of deep convolutional neural networks for fast and low power mobile applications. CoRR abs/1511.06530
Krizhevsky A, Nair V, Hinton G (2014) The CIFAR-10 dataset. http://www.cs.toronto.edu/kriz/cifar.html
Kumar Karthik, Lu YH (2010) Cloud computing for mobile users: can offloading computation save energy? Computer 43(4):51–56
Article Google Scholar
Lee CL, Hsu MY, Lu BS, Hung MY, Lee JK (2020) Experiment and enabled flow for GPGPU-sim simulators with fixed-point instructions. J Syst Arch 111:101783
Article Google Scholar
Mao J, Chen X, Nixon KW, Krieger C, Chen Y (2017) Modnn: local distributed mobile computing system for deep neural network. In: Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, pp 1396–1401
Matsubara Y, Levorato M, Restuccia F (2021) Split computing and early exiting for deep learning applications: Survey and research challenges. CoRR abs/2103.04505
McDanel B, Teerapittayanon S, Kung HT (2017) Embedded binarized neural networks. In: Proceedings of the International Conference on Embedded Wireless Systems and Networks, pp 168–173
Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M (2018) Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor 20(4):2923–2960
Article Google Scholar
Nomi T (2018) Tiny-DNN documentation. https://media.readthedocs.org/pdf/tiny-dnn/latest/tiny-dnn.pdf
Satyanarayanan M (2017) The emergence of edge computing. Computer 50(1):30–39
Article Google Scholar
Scalagent Distributed Technologies (2015) Benchmark of MQTT servers. http://www.scalagent.com/IMG/pdf/Benchmark_MQTT_servers-v1-1.pdf
Scardapane S, Scarpiniti M, Baccarelli E, Uncini A (2020) Why should we add early exits to neural networks? Cogn Comput 12(5):954–966
Article Google Scholar
Tama BA, Rhee KH (2017) Attack classification analysis of IoT network via deep learning approach. Research Briefs on Information & Communication Technology Evolution
Teerapittayanon S, McDanel B, Kung H (2016) Branchynet: fast inference via early exiting from deep neural networks. In: Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), pp 2464–2469
Teerapittayanon S, McDanel B, Kung HT (2017) Distributed deep neural networks over the cloud, the edge and end devices. In: Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, pp 328–339. https://doi.org/10.1109/ICDCS.2017.226
Truex S, Baracaldo N, Anwar A, Steinke T, Ludwig H, Zhang R, Zhou Y (2019) A hybrid approach to privacy-preserving federated learning. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pp 1–11
Tseng T, Hung S, Tu C (2015) Migratom.js: a Javascript migration framework for distributed web computing and mobile devices. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 798–801. https://doi.org/10.1145/2695664.2695987
Wang H, Yurochkin M, Sun Y, Papailiopoulos D, Khazaeni Y (2020) Federated learning with matched averaging. CoRR abs/2002.06440
Wang Y, Li H, Li X (2016) Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices. In: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp 1–6
Xiao L, Wan X, Lu X, Zhang Y, Wu D (2018) IoT security techniques based on machine learning: how do IoT devices use AI to enhance security? IEEE Signal Process Mag 35(5):41–49
Article Google Scholar
Xu Z, Cheung RC (2020) Binary convolutional neural network acceleration framework for rapid system prototyping. J Syst Arch 109:101762
Article Google Scholar
Yang K, Ou S, Chen HH (2008) On effective offloading services for resource-constrained mobile devices running heavier mobile internet applications. IEEE Commun Mag 46(1):56–63
Article Google Scholar
Yi S, Hao Z, Qin Z, Li Q (2015) Fog computing: platform and applications. In: Proceedings of the IEEE Workshop on Hot Topics in Web Systems and Technologies, pp 73–78
Zhang S, Zhang S, Qian Z, Wu J, Jin Y, Lu S (2021) Deepslicing: collaborative and adaptive CNN inference with low latency. In: Proceedings of the IEEE Transactions on Parallel and Distributed Systems
Zhang X, Ramachandran A, Zhuge C, He D, Zuo W, Cheng Z, Rupnow K, Chen D (2017) Machine learning on FPGAS to face the IoT revolution. In: Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, pp 819–826

Download references

Acknowledgements

This work is financially supported in part by the Ministry of Science and Technology of Taiwan under the Grant Number 107-2221-E-006-045-MY3. This work is also financially supported by the Intelligent Manufacturing Research Center (iMRC) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
Chia-Heng Tu, QiHui Sun & Mu-Hsuan Cheng

Authors

Chia-Heng Tu
View author publications
You can also search for this author in PubMed Google Scholar
QiHui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Mu-Hsuan Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to QiHui Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tu, CH., Sun, Q. & Cheng, MH. On designing the adaptive computation framework of distributed deep learning models for Internet-of-Things applications. J Supercomput 77, 13191–13223 (2021). https://doi.org/10.1007/s11227-021-03795-4

Download citation

Accepted: 02 April 2021
Published: 21 April 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11227-021-03795-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On designing the adaptive computation framework of distributed deep learning models for Internet-of-Things applications

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

RETRACTED ARTICLE: A Review and State of Art of Internet of Things (IoT)

A review of object detection based on deep learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On designing the adaptive computation framework of distributed deep learning models for Internet-of-Things applications

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

RETRACTED ARTICLE: A Review and State of Art of Internet of Things (IoT)

A review of object detection based on deep learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation