Performance of LLMs on Computing Systems for Deployment in IoT Devices

Grumeza, Theodor-Radu; Lazãr, Thomas-Andrei; Fortiş, Alexandra-Emilia

doi:10.1007/978-3-031-76452-3_24

Theodor-Radu Grumeza³,
Thomas-Andrei Lazãr³ &
Alexandra-Emilia Fortiş³

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 231))

Included in the following conference series:

International Conference on Broadband and Wireless Computing, Communication and Applications

190 Accesses

Abstract

In this study, the authors explore the performance of different Large Language Models such as BART-Base, GPT Neo and DistilGPT-2 on hardware devices. These models are fine-tuned on a general dataset and tested on systems with various computing capabilities, from high-end servers and cloud infrastructures to more resource-constrained embedded devices. The main objective is to determine how fast a model can handle the input when given, the precision of text summarisation and the similarity between the machine-generated translation and the reference translations. The novelty of this research lies in finding the compromise between the speed of processing and the precision in generating the output. This approach aims to determine which model and system performs the best for future deployment in Internet of Things (IoT) devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 179.99; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A General Purpose Device for Interaction with LLMs

Elements of TinyML on Constrained Resource Hardware

Towards High Throughput Semantic Translation

References

Zhao, W.X., et al: A survey of large language models, Peiyu Liu (2023)
Google Scholar
Hoffmann, J., et al.: Training compute-optimal large language models. arXiv preprint arXiv:2203.15556 (2022)
Vailshery, L.S.: Number of internet of things (IoT) connections worldwide from 2022 to 2023, with forecasts from 2024 to 2033 (2024)
Google Scholar
Barbella, M., Tortora, G.: Rouge metric evaluation for text summarization techniques. Available at SSRN 4120317 (2022)
Google Scholar
Son, J., Kim, B.: Translation performance from the user’s perspective of large language models and neural machine translation systems. Information 14(10), 574 (2023)
Article Google Scholar
Sheng, Y., et al.: Flexgen: high-throughput generative inference of large language models with a single GPU. In: International Conference on Machine Learning, pp. 31094–31116. PMLR (2023)
Google Scholar
Dhar, N., Deng, B., Lo, D., Wu, X., Zhao, L., Suo, K.: An empirical analysis and resource footprint study of deploying large language models on edge devices. In: Proceedings of the 2024 ACM Southeast Conference, ACM SE 2024, pp. 69–76, New York, NY, USA, Association for Computing Machinery (2024)
Google Scholar
Sikorski, P.,et al.: Deployment of NLP and LLM techniques to control mobile robots at the edge: a case study using GPT-4-turbo and llama, February 2024
Google Scholar
Lewis, M., et al.: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, Bart (2019)
Google Scholar
Wang, A., Cho, K.: Bert has a mouth, and it must speak: bert as a markov random field language model. arXiv preprint arXiv:1902.04094 (2019)
Xiong, X., Zheng, M.: GPT-neo-CRV: Elevating information accuracy in GPT-neo with cross-referential validation. Authorea Preprints (2024)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805 (2018)
Google Scholar
Bisht, T.: iamtarun python code instructions 18k alpaca (2023)
Google Scholar

Download references

Acknowledgement

This research was partially supported by FSS project"SmartBits Robotics - Creation and Development of Ideas and Technologies" - 2182.

Author information

Authors and Affiliations

Faculty of Mathematics and Informatics, West University of Timişoara, bvd. V. Parvan, 4, 300223, Timişoara, Romania
Theodor-Radu Grumeza, Thomas-Andrei Lazãr & Alexandra-Emilia Fortiş

Authors

Theodor-Radu Grumeza
View author publications
You can also search for this author in PubMed Google Scholar
Thomas-Andrei Lazãr
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra-Emilia Fortiş
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Theodor-Radu Grumeza .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grumeza, TR., Lazãr, TA., Fortiş, AE. (2025). Performance of LLMs on Computing Systems for Deployment in IoT Devices. In: Barolli, L. (eds) Advances on Broad-Band Wireless Computing, Communication and Applications. BWCCA 2024. Lecture Notes on Data Engineering and Communications Technologies, vol 231. Springer, Cham. https://doi.org/10.1007/978-3-031-76452-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-76452-3_24
Published: 12 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-76451-6
Online ISBN: 978-3-031-76452-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Performance of LLMs on Computing Systems for Deployment in IoT Devices