Abstract
In this study, the authors explore the performance of different Large Language Models such as BART-Base, GPT Neo and DistilGPT-2 on hardware devices. These models are fine-tuned on a general dataset and tested on systems with various computing capabilities, from high-end servers and cloud infrastructures to more resource-constrained embedded devices. The main objective is to determine how fast a model can handle the input when given, the precision of text summarisation and the similarity between the machine-generated translation and the reference translations. The novelty of this research lies in finding the compromise between the speed of processing and the precision in generating the output. This approach aims to determine which model and system performs the best for future deployment in Internet of Things (IoT) devices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhao, W.X., et al: A survey of large language models, Peiyu Liu (2023)
Hoffmann, J., et al.: Training compute-optimal large language models. arXiv preprint arXiv:2203.15556 (2022)
Vailshery, L.S.: Number of internet of things (IoT) connections worldwide from 2022 to 2023, with forecasts from 2024 to 2033 (2024)
Barbella, M., Tortora, G.: Rouge metric evaluation for text summarization techniques. Available at SSRN 4120317 (2022)
Son, J., Kim, B.: Translation performance from the user’s perspective of large language models and neural machine translation systems. Information 14(10), 574 (2023)
Sheng, Y., et al.: Flexgen: high-throughput generative inference of large language models with a single GPU. In: International Conference on Machine Learning, pp. 31094–31116. PMLR (2023)
Dhar, N., Deng, B., Lo, D., Wu, X., Zhao, L., Suo, K.: An empirical analysis and resource footprint study of deploying large language models on edge devices. In: Proceedings of the 2024 ACM Southeast Conference, ACM SE 2024, pp. 69–76, New York, NY, USA, Association for Computing Machinery (2024)
Sikorski, P.,et al.: Deployment of NLP and LLM techniques to control mobile robots at the edge: a case study using GPT-4-turbo and llama, February 2024
Lewis, M., et al.: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, Bart (2019)
Wang, A., Cho, K.: Bert has a mouth, and it must speak: bert as a markov random field language model. arXiv preprint arXiv:1902.04094 (2019)
Xiong, X., Zheng, M.: GPT-neo-CRV: Elevating information accuracy in GPT-neo with cross-referential validation. Authorea Preprints (2024)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805 (2018)
Bisht, T.: iamtarun python code instructions 18k alpaca (2023)
Acknowledgement
This research was partially supported by FSS project"SmartBits Robotics - Creation and Development of Ideas and Technologies" - 2182.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Grumeza, TR., Lazãr, TA., Fortiş, AE. (2025). Performance of LLMs on Computing Systems for Deployment in IoT Devices. In: Barolli, L. (eds) Advances on Broad-Band Wireless Computing, Communication and Applications. BWCCA 2024. Lecture Notes on Data Engineering and Communications Technologies, vol 231. Springer, Cham. https://doi.org/10.1007/978-3-031-76452-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-76452-3_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-76451-6
Online ISBN: 978-3-031-76452-3
eBook Packages: EngineeringEngineering (R0)