Abstract
Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. In the path toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. Accordingly, this study introduces a framework taking inspiration from model-based reinforcement learning to determine the optimal splitting point across the edge and user equipment. By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.
摘要
在边缘计算环境中优化大型语言模型(LLMs)的部署对提升隐私保护和计算效率至关重要。为实现高效的无线LLM推理, 本文全面分析了主流开源LLMs中不同分割点的影响。本文引入一个基于模型的强化学习(MBRL)框架, 以确定边缘和用户设备(UE)之间的最佳分割点。通过引入奖励替代模型, 该方法显著减少了频繁的性能评估的计算成本。广泛的仿真结果表明, 该方法在不同网络条件下有效地平衡了推理性能和计算负载, 为去中心化环境中LLM的部署提供稳健的解决方案。
Article PDF
Avoid common mistakes on your manuscript.
Data availability
The data that support the findings of this study are available in zjunice github at https://github.com/zjunice.
References
Abbas N, Zhang Y, Taherkordi A, et al., 2018. Mobile edge computing: a survey. IEEE Int Things J, 5(1):450–465. https://doi.org/10.1109/JIOT.2017.2750180
Bai YT, Jones A, Ndousse K, et al., 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. https://arxiv.org/abs/2204.05862
Beaulieu NC, Cheng C, 2005. Efficient Nakagami-m fading channel simulation. IEEE Trans Veh Technol, 54(2):413–424. https://doi.org/10.1109/TVT.2004.841555
Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159. https://doi.org/10.5555/3495724.3495883
Chen L, Ahmed NK, Dutta A, et al., 2024. The landscape and challenges of HPC research and LLMs. https://arxiv.org/abs/2402.02018
Chen MZ, Gündüz D, Huang KB, et al., 2021. Distributed learning in wireless networks: recent progress and future challenges. IEEE J Sel Areas Commun, 39(12):3579–3605. https://doi.org/10.1109/JSAC.2021.3118346
Chen YX, Li RP, Zhao ZF, et al., 2024. NetGPT: an AI-native network architecture for provisioning beyond personalized generative services. IEEE Netw, 38(6):404–413. https://doi.org/10.1109/MNET.2024.3376419
Cleveland WS, 1979. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc, 74(368):829–836. https://doi.org/10.1080/01621459.1979.10481038
Deisenroth MP, Rasmussen CE, 2011. PILCO: a model-based and data-efficient approach to policy search. Proc 28th Int Conf on Machine Learning, p.465–472. https://doi.org/10.5555/3104482.3104541
Dong QF, Chen XL, Satyanarayanan M, 2024. Creating edge AI from cloud-based LLMs. Proc 25th Int Workshop on Mobile Computing Systems and Applications, p.8–13. https://doi.org/10.1145/3638550.3641126
Egorov V, Shpilman A, 2022. Scalable multi-agent modelbased reinforcement learning. Proc 21st Int Conf on Autonomous Agents and Multiagent Systems, p.381–390.
Gemini Team Google, 2023. Gemini: a family of highly capable multimodal models. https://arxiv.org/abs/2312.11805
Gupta O, Raskar R, 2018. Distributed learning of deep neural network over multiple agents. J Netw Comput Appl, 116:1–8. https://doi.org/10.1016/j.jnca.2018.05.003
Gupta R, Sosio N, 2024. Introducing Prem-1B. https://blog.premai.io/introducing-prem-1b/
Hadi MU, Tashi QA, Qureshi R, et al., 2023. A survey on large language models: applications, challenges, limitations, and practical usage. https://www.techrxiv.org/doi/full/10.36227/techrxiv.23589741.v1
Icarte RT, Klassen TQ, Valenzano R, et al., 2023. Learning reward machines: a study in partially observable reinforcement learning. Artif Intell, 323:103989. https://doi.org/10.1016/j.artint.2023.103989
Jiang AQ, Sablayrolles A, Mensch A, et al., 2023. Mistral 7B. https://arxiv.org/abs/2310.06825
Jin MY, Yu QK, Shu D, et al., 2024. Health-LLM: personalized retrieval-augmented disease prediction system. https://arxiv.org/abs/2402.00746
Kaddour J, Harris J, Mozes M, et al., 2023. Challenges and applications of large language models. https://arxiv.org/abs/2307.10169
Kaiser L, Babaeizadeh M, Milos P, et al., 2019. Model-based reinforcement learning for Atari. https://arxiv.org/abs/1903.00374
Karjee J, Naik SP, Anand K, et al., 2022. Split computing: DNN inference partition with load balancing in IoT-edge platform for beyond 5G. Meas Sens, 23:100409. https://doi.org/10.1016/j.measen.2022.100409
Ke CH, Astuti L, 2023. Applying multi-agent deep reinforcement learning for contention window optimization to enhance wireless network performance. ICT Express, 9(5):776–782. https://doi.org/10.1016/j.icte.2022.07.009
Lan Q, Zeng QS, Popovski P, et al., 2021. Progressive feature transmission for split inference at the wireless edge. https://arxiv.org/abs/2112.07244
Le Scao T, Fan A, Akiki C, et al., 2022. BLOOM: a 176B-parameter open-access multilingual language model. https://arxiv.org/abs/2211.05100
Lee J, Lee H, Choi W, 2023. Wireless channel adaptive DNN split inference for resource-constrained edge devices. IEEE Commun Lett, 27(6):1520–1524. https://doi.org/10.1109/LCOMM.2023.3269769
Letaief KB, Shi YM, Lu JM, et al., 2022. Edge artificial intelligence for 6G: vision, enabling technologies, and applications. IEEE J Sel Areas Commun, 40(1):5–36. https://doi.org/10.1109/JSAC.2021.3126076
Li E, Zeng LK, Zhou Z, et al., 2020. Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans Wirel Commun, 19(1):447–457. https://doi.org/10.1109/TWC.2019.2946140
Li X, Lu LY, Ni W, et al., 2022. Federated multi-agent deep reinforcement learning for resource allocation of vehicle-to-vehicle communications. IEEE Trans Veh Technol, 71(8):8810–8824. https://doi.org/10.1109/TVT.2022.3173057
Li YX, 2017. Deep reinforcement learning: an overview. https://arxiv.org/abs/1701.07274
Lin B, Zhang C, Peng T, et al., 2024. Infinite-LLM: efficient LLM service for long context with DistAttention and distributed KVCache. https://arxiv.org/abs/2401.02669
Lin Z, Qu GQ, Chen QY, et al., 2024a. Pushing large language models to the 6G edge: vision, challenges, and opportunities. https://arxiv.org/abs/2309.16739
Lin Z, Qu GQ, Chen XH, et al., 2024b. Split learning in 6G edge networks. IEEE Wirel Commun, 31(4):170–176. https://doi.org/10.1109/MWC.014.2300319
Liu D, Sun CJ, Yang CY, et al., 2020. Optimizing wireless systems using unsupervised and reinforced-unsupervised deep learning. IEEE Netw, 34(4):270–277. https://doi.org/10.1109/MNET.001.1900517
Luong NC, Hoang DT, Gong SM, et al., 2019. Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor, 21(4):3133–3174. https://doi.org/10.1109/COMST.2019.2916583
Mach P, Becvar Z, 2017. Mobile edge computing: a survey on architecture and computation offloading. IEEE Commun Surv Tutor, 19(3):1628–1656. https://doi.org/10.1109/COMST.2017.2682318
Mao YY, You CS, Zhang J, et al., 2017. A survey on mobile edge computing: the communication perspective. IEEE Commun Surv Tutor, 19(4):2322–2358. https://doi.org/10.1109/COMST.2017.2745201
Merity S, Xiong CM, Bradbury J, et al., 2016. Pointer sentinel mixture models. https://arxiv.org/abs/1609.07843
Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533. https://doi.org/10.1038/nature14236
Mnih V, Badia AP, Mirza M, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928–1937. https://doi.org/10.5555/3045390.3045594
Moerland TM, Broekens J, Plaat A, et al., 2023. Model-based reinforcement learning: a survey. Foundat Trends® Mach Learn, 16(1):1–118. https://doi.org/10.1561/2200000086
Nakagami M, 1960. The m-distribution—a general formula of intensity distribution of rapid fading. In: Hoffman WC (Ed.), Statistical Methods in Radio Wave Propagation. Pergamon, UK, p.3–36. https://doi.org/10.1016/B978-0-08-009306-2.50005-4
Nijkamp E, Pang B, Hayashi H, et al., 2022. CodeGen: an open large language model for code with multi-turn program synthesis. https://arxiv.org/abs/2203.13474
Ong I, 2024. Efficient Distributed LLM Inference with Dynamic Partitioning. Technical Report UCB/EECS-2024-108, California, USA.
OpenAI, 2023. GPT-4 Technical Report, San Francisco, USA.
Patil R, Gudivada V, 2024. A review of current trends, techniques, and challenges in large language models (LLMs). Appl Sci, 14(5):2074. https://doi.org/10.3390/app14052074
Pham QV, Fang F, Ha VN, et al., 2020. A survey of multiaccess edge computing in 5G and beyond: fundamentals, technology integration, and state-of-the-art. IEEE Access, 8:116974–117017. https://doi.org/10.1109/ACCESS.2020.3001277
Qian YC, Wu J, Wang R, et al., 2019. Survey on reinforcement learning applications in communication networks. J Commun Inform Netw, 4(2):30–39. https://doi.org/10.23919/JCIN.2019.8917870
Qiao LT, Zhou Y, 2023. Timely split inference in wireless networks: an accuracy-freshness tradeoff. IEEE Trans Veh Technol, 72(12):16817–16822. https://doi.org/10.1109/TVT.2023.3294494
Romoff J, Henderson P, Piché A, et al., 2018. Reward estimation for variance reduction in deep reinforcement learning. Proc 2nd Conf on Robot Learning, p.674–699.
Rozière B, Gehring J, Gloeckle F, et al., 2023. Code LLAMA: open foundation models for code. https://arxiv.org/abs/2308.12950
Ryu J, Won D, Lee Y, 2022. A study of split learning model. 16th Int Conf on Ubiquitous Information Management and Communication, p.1–4. https://doi.org/10.1109/IMCOM53663.2022.9721798
Satyanarayanan M, Bahl P, Caceres R, et al., 2009. The case for VM-based cloudlets in mobile computing. IEEE Pervas Comput, 8(4):14–23. https://doi.org/10.1109/MPRV.2009.82
Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347
Shlezinger N, Farsad N, Eldar YC, et al., 2021. Model-based machine learning for communications. https://arxiv.org/abs/2101.04726
Stone M, 1974. Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc Ser B, 36(2):111–133. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x
Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al., 2023. Large language models in medicine. Nat Med, 29(8):1930–1940. https://doi.org/10.1038/s41591-023-02448-8
Touvron H, Martin L, Stone K, et al., 2023. LLAMA 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288
Üstün A, Aryabumi V, Yong ZX, et al., 2024. Aya model: an instruction finetuned open-access multilingual language model. Proc 62nd Annual Meeting of the Association for Computational Linguistics, p.15894–15939. https://doi.org/10.18653/v1/2024.acl-long.845
Wang G, Cheng SJ, Zhan XY, et al., 2023. OpenChat: advancing open-source language models with mixed-quality data. https://arxiv.org/abs/2309.11235
Wang YZ, Guo K, Hong W, et al., 2023. Split learning in wireless networks: a communication and computation adaptive scheme. IEEE/CIC Int Conf on Communications in China, p.1–6. https://doi.org/10.1109/ICCC57788.2023.10233330
Webb T, Holyoak KJ, Lu HJ, 2023. Emergent analogical reasoning in large language models. Nat Hum Behav, 7(9):1526–1541. https://doi.org/10.1038/s41562-023-01659-w
Wei J, Bosma M, Zhao VY, et al., 2021. Finetuned language models are zero-shot learners. https://arxiv.org/abs/2109.01652
Wu SJ, Irsoy O, Lu S, et al., 2023. BloombergGPT: a large language model for finance. https://arxiv.org/abs/2303.17564
Yang K, Shi CS, Shen C, et al., 2023. Offline reinforcement learning for wireless network optimization with mixture datasets. IEEE Trans Wirel Commun, 23(10):12703–12716. https://doi.org/10.1109/TWC.2024.3395624
Zhang MJ, Cao JN, Shen XM, et al., 2024. EdgeShard: efficient LLM inference via collaborative edge computing. https://arxiv.org/abs/2405.14371
Zhang XH, Yu BW, Yu HY, et al., 2023. Wider and deeper LLM networks are fairer LLM evaluators. https://arxiv.org/abs/2308.01862
Zhu LW, Takami G, Kawahara M, et al., 2022. Alleviating parameter-tuning burden in reinforcement learning for large-scale process control. Comput Chem Eng, 158:107658. https://doi.org/10.1016/j.compchemeng.2022.107658
Author information
Authors and Affiliations
Contributions
Yuxuan CHEN and Rongpeng LI designed the research. Yuxuan CHEN performed the simulation, processed the data, and drafted the paper. Rongpeng LI and Honggang ZHANG helped organize the paper. Honggang ZHANG, Zhifeng ZHAO, and Xiaoxue YU revised the paper. Yuxuan CHEN and Rongpeng LI finalized the paper.
Corresponding author
Ethics declarations
Honggang ZHANG is a guest editor of this special issue; he was not involved with the peer review process of this paper. All the authors declare that they have no conflict of interest.
Additional information
Project supported by the National Key Research and Development Program of China (No. 2024YFE0200600), the National Natural Science Foundation of China (No. 62071425), the Zhejiang Key Research and Development Plan, China (No. 2022C01093), the Zhejiang Provincial Natural Science Foundation of China (No. LR23F010005), the National Key Laboratory of Wireless Communications Foundation, China (No. 2023KP01601), and the Big Data and Intelligent Computing Key Lab of CQUPT, China (No. BDIC-2023-B-001)
Rights and permissions
About this article
Cite this article
Chen, Y., Li, R., Yu, X. et al. Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach. Front Inform Technol Electron Eng 26, 278–292 (2025). https://doi.org/10.1631/FITEE.2400468
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2400468
Key words
- Large language models (LLMs)
- Edge computing
- Model-based reinforcement learning (MBRL)
- Split inference
- Transformer