Abstract
In this research, we aim to conduct a systematic mapping study on Large Language Models (LLMs) for Software Engineering (SE). The significantly enhanced capabilities of LLMs have led to their use in many fields, including the important domain of SE. SE processes involve numerous artifacts, such as code, requirements, and documentation, which can serve as input to LLMs. To determine the potential applications of LLMs in SE, it’s crucial to understand their capabilities. Therefore, this systematic mapping study will explore the capabilities and potential of LLMs in SE tasks. Additionally, this research will address issues associated with LLMs, such as their non-deterministic nature and the problem of hallucinations. It will serve as a valuable resource for software developers, researchers, and practitioners interested in the intersection of artificial intelligence and SE, guiding their decisions on integrating these technologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achiam, J., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Ahmad, W., Chakraborty, S., Ray, B., Chang, K.W.: Unified pre-training for program understanding and generation. In: Toutanova, K., et al. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2655–2668. Association for Computational Linguistics, Online, June 2021. https://doi.org/10.18653/v1/2021.naacl-main.211, https://aclanthology.org/2021.naacl-main.211
Alexander, F., Abdiwijaya, E.A., Pherry, F., Gunawan, A.A.S., et al.: Systematic literature review on solving competitive programming problem with artificial intelligence (ai). In: 2022 1st International Conference on Software Engineering and Information Technology (ICoSEIT), pp. 85–90. IEEE (2022)
Allal, L.B., et al.: Santacoder: don’t reach for the stars! Deep Learning for Code (DL4C) Workshop (2023). https://par.nsf.gov/biblio/10416454
Athiwaratkun, B., et al.: Multi-lingual evaluation of code generation models. arXiv preprint arXiv:2210.14868 (2022)
Austin, J., et al.: Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021)
Bellagente, M., et al.: Stable lm 2 1.6 b technical report. arXiv preprint arXiv:2402.17834 (2024)
Black, S., et al.: GPT-NeoX-20B: an open-source autoregressive language model. In: Fan, A., Ilic, S., Wolf, T., Gallé, M. (eds.) Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pp. 95–136. Association for Computational Linguistics, virtual+Dublin, May 2022. https://doi.org/10.18653/v1/2022.bigscience-1.9, https://aclanthology.org/2022.bigscience-1.9
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Cassano, F., et al.: Multipl-e: a scalable and polyglot approach to benchmarking neural code generation. IEEE Trans. Softw. Eng. (2023)
Chai, Y., Wang, S., Pang, C., Sun, Y., Tian, H., Wu, H.: ERNIE-code: beyond English-centric cross-lingual pretraining for programming languages. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, pp. 10628–10650. Association for Computational Linguistics, Toronto, Canada, July 2023. https://doi.org/10.18653/v1/2023.findings-acl.676, https://aclanthology.org/2023.findings-acl.676
Chandel, S., Clement, C.B., Serrato, G., Sundaresan, N.: Training and evaluating a jupyter notebook data science assistant. arXiv preprint arXiv:2201.12901 (2022)
Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)
Chiang, W.L., et al.: Vicuna: an open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. https://lmsys.org/blog/2023-03-30-vicuna/
Chowdhery, A., et al.: Palm: scaling language modeling with pathways. J. Mach. Learn. Res. 24(240), 1–113 (2023)
Christopoulou, F., et al.: Pangu-coder: Program synthesis with function-level language modeling. arXiv preprint arXiv:2207.11280 (2022)
Clement, C., Drain, D., Timcheck, J., Svyatkovskiy, A., Sundaresan, N.: PyMT5: multi-mode translation of natural language and python code with transformers. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9052–9065. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.emnlp-main.728, https://aclanthology.org/2020.emnlp-main.728
Desai, A., Deo, A.: Introducing amazon codewhisperer, the ml-powered coding companion (2022). https://aws.amazon.com/blogs/machine-learning/introducing-amazon-codewhisperer-the-ml-powered-coding-companion/. Accessed 22 Feb 2024
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (Jun 2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Ebert, C., Louridas, P.: Generative ai for software practitioners. IEEE Softw. 40(4), 30–38 (2023)
Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., Zhang, J.M.: Large language models for software engineering: survey and open problems. In: 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). pp. 31–53. IEEE Computer Society, Los Alamitos, CA, USA, May 2023. https://doi.org/10.1109/ICSE-FoSE59343.2023.00008, https://doi.ieeecomputersociety.org/10.1109/ICSE-FoSE59343.2023.00008
Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1536–1547. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.139, https://aclanthology.org/2020.findings-emnlp.139
Fried, D., et al.: Incoder: a generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999 (2022)
Gunasekar, S., et al.: Textbooks are all you need. arXiv preprint arXiv:2306.11644 (2023)
Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J.: UniXcoder: unified cross-modal pre-training for code representation. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7212–7225. Association for Computational Linguistics, Dublin, Ireland, May 2022. https://doi.org/10.18653/v1/2022.acl-long.499, https://aclanthology.org/2022.acl-long.499
Guo, D., et al.: Graphcodebert: pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020)
Hendrycks, D., Basart, S., Kadavath, S., Mazeika, M., Arora, A., Guo, E., Burns, C., Puranik, S., He, H., Song, D., Steinhardt, J.: Measuring coding challenge competence with apps. NeurIPS (2021)
Hort, M., Grishina, A., Moonen, L.: An exploratory literature study on sharing and energy use of language models for source code. In: 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–12. IEEE (2023)
Hou, X., et al.: Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620 (2024)
Jin, X., Larson, J., Yang, W., Lin, Z.: Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models (2023)
Kanade, A., Maniatis, P., Balakrishnan, G., Shi, K.: Learning and evaluating contextual embedding of source code. In: International Conference on Machine Learning, pp. 5110–5121. PMLR (2020)
Kudo, T.: Subword regularization: Improving neural network translation models with multiple subword candidates. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 66–75. Association for Computational Linguistics, Melbourne, Australia, July 2018. https://doi.org/10.18653/v1/P18-1007, https://aclanthology.org/P18-1007
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Le, H., Wang, Y., Gotmare, A.D., Savarese, S., Hoi, S.C.H.: Coderl: mastering code generation through pretrained models and deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 35, 21314–21328 (2022)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.703, https://aclanthology.org/2020.acl-main.703
Li, R., et al.: Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023)
Li, X., et al.: Coderetriever: a large scale contrastive pre-training method for code search. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2898–2910 (2022)
Li, Y., et al.: Competition-level code generation with alphacode. Science 378(6624), 1092–1097 (2022)
Lin, J., Liu, Y., Zeng, Q., Jiang, M., Cleland-Huang, J.: Traceability transformed: Generating more accurate links with pre-trained bert models. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 324–335. IEEE (2021)
Lu, S., e al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. CoRR abs/2102.04664 (2021)
Luo, X., Xue, Y., Xing, Z., Sun, J.: Prcbert: rompt learning for requirement classification using bert-based pretrained language models. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–13 (2022)
Luo, Z., et al.: Wizardcoder: empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568 (2023)
Mielke, S.J., et al.: Between words and characters: A brief history of open-vocabulary modeling and tokenization in nlp. Computing Research Repository (arXiv) (2021). https://par.nsf.gov/biblio/10347731
Naveed, H., et al.: A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435 (2023)
Nijkamp, E., Hayashi, H., Xiong, C., Savarese, S., Zhou, Y.: Codegen2: lessons for training llms on programming and natural languages. ICLR (2023)
Nijkamp, E., et al.: Codegen: an open large language model for code with multi-turn program synthesis. ICLR (2023)
Niu, C., Li, C., Ng, V., Ge, J., Huang, L., Luo, B.: Spt-code: sequence-to-sequence pre-training for learning source code representations. In: Proceedings of the 44th International Conference on Software Engineering, pp. 2006–2018 (2022)
Nyberg, E.P., et al.: Bard: a structured technique for group elicitation of bayesian networks to support analytic reasoning. Risk Anal. 42(6), 1155–1178 (2022)
Ouyang, L., et al.: Training language models to follow instructions with human feedback (2022)
Ouyang, S., Zhang, J.M., Harman, M., Wang, M.: Llm is like a box of chocolates: the non-determinism of chatgpt in code generation. arXiv preprint arXiv:2308.02828 (2023)
Ozkaya, I.: Application of large language models to software engineering tasks: opportunities, risks, and implications. IEEE Softw. 40(3), 4–8 (2023)
Ozkaya, I.: Can architecture knowledge guide software development with generative ai? IEEE Softw. 40(5), 4–8 (2023)
Ozkaya, I.: The next frontier in software development: Ai-augmented software development processes. IEEE Softw. 40(4), 4–9 (2023)
Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: a roadmap. IEEE Trans. Knowl. Data Eng. (2024)
Phan, L., et al.: CoTexT: multi-task learning with code-text transformer. In: Lachmy, R., et al. (eds.) Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021), pp. 40–47. Association for Computational Linguistics, Online, August 2021. https://doi.org/10.18653/v1/2021.nlp4prog-1.5, https://aclanthology.org/2021.nlp4prog-1.5
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Schuster, M., Nakajima, K.: Japanese and korean voice search. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5149–5152. IEEE (2012)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Erk, K., Smith, N.A. (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin, Germany, August 2016. https://doi.org/10.18653/v1/P16-1162, https://aclanthology.org/P16-1162
Shen, B., et al.: Pangu-coder2: Boosting large language models for code with ranking feedback. arXiv preprint arXiv:2307.14936 (2023)
Srivastava, A., et al.: Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615 (2022)
Tabassum, J., Maddela, M., Xu, W., Ritter, A.: Code and named entity recognition in StackOverflow. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics pp. 4913–4926. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.443, https://aclanthology.org/2020.acl-main.443
Thoppilan, R., et al.: Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022)
Touvron, H., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Trautsch, A., Herbold, S.: Predicting issue types with sebert. In: Proceedings of the 1st International Workshop on Natural Language-based Software Engineering, pp. 37–39 (2022)
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Wang, B.: Mesh-Transformer-JAX: Model-Parallel Implementation of Transformer Language Model with JAX. https://github.com/kingoflolz/mesh-transformer-jax, May 2021
Wang, S., Huang, L., Gao, A., Ge, J., Zhang, T., Feng, H., Satyarth, I., Li, M., Zhang, H., Ng, V.: Machine/deep learning for software engineering: a systematic literature review. IEEE Trans. Software Eng. 49(3), 1188–1231 (2022)
Wang, Y., Le, H., Gotmare, A., Bui, N., Li, J., Hoi, S.: CodeT5+: Open code large language models for code understanding and generation. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 1069–1088. Association for Computational Linguistics, Singapore, December 2023. https://doi.org/10.18653/v1/2023.emnlp-main.68, https://aclanthology.org/2023.emnlp-main.68
Wang, Y., Wang, W., Joty, S., Hoi, S.C.: CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (Nov 2021). https://doi.org/10.18653/v1/2021.emnlp-main.685, https://aclanthology.org/2021.emnlp-main.685
Wang, Z., Zhou, S., Fried, D., Neubig, G.: Execution-based evaluation for open-domain code generation. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 1271–1290. Association for Computational Linguistics, Singapore (Dec 2023). https://doi.org/10.18653/v1/2023.findings-emnlp.89, https://aclanthology.org/2023.findings-emnlp.89
Xu, F.F., Alon, U., Neubig, G., Hellendoorn, V.J.: A systematic evaluation of large language models of code. In: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, pp. 1–10 (2022)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019)
Zan, D., Chen, B., Lin, Z., Guan, B., Yongji, W., Lou, J.G.: When language model meets private library. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 277–288. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, December 2022. https://doi.org/10.18653/v1/2022.findings-emnlp.21, https://aclanthology.org/2022.findings-emnlp.21
Zan, D., et al.: Cert: continual pre-training on sketches for library-oriented code generation. In: Raedt, L.D. (ed.) Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 2369–2375. International Joint Conferences on Artificial Intelligence Organization (7 2022). https://doi.org/10.24963/ijcai.2022/329, https://doi.org/10.24963/ijcai.2022/329
Zan, D., et al.: Large language models meet NL2Code: a survey. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7443–7464. Association for Computational Linguistics, Toronto, Canada, July 2023. https://doi.org/10.18653/v1/2023.acl-long.411, https://aclanthology.org/2023.acl-long.411
Zhang, J., Panthaplackel, S., Nie, P., Li, J.J., Gligoric, M.: Coditt5: pretraining for source code and natural language editing. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–12 (2022)
Zhang, Q., et al.: A survey on large language models for software engineering. arXiv preprint arXiv:2312.15223 (2023)
Zhang, S., et al.: Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022)
Zhao, S.: Github copilot now has a better ai model and new capabilities (2023). https://github.blog/2023-02-14-github-copilot-now-has-a-better-ai-model-and-new-capabilities. Accessed 25 Oct 2023
Zhao, W.X., et al.: A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)
Zheng, Q., et al.: Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 5673–5684 (2023)
Zhuang, L., Wayne, L., Ya, S., Jun, Z.: A robustly optimized BERT pre-training approach with post-training. In: Li, S., Sun, M., Liu, Y., Wu, H., Liu, K., Che, W., He, S., Rao, G. (eds.) Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 1218–1227. Chinese Information Processing Society of China, Huhhot, China, August 2021. https://aclanthology.org/2021.ccl-1.108
Acknowledgment
This research is supported in part by SFI, Science Foundation Ireland (https://www.sfi.ie/) grant No SFI 13/RC/2094_P2 to Lero - the Science Foundation Ireland Research Centre for Software.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Görmez, M.K., Yılmaz, M., Clarke, P.M. (2024). Large Language Models for Software Engineering: A Systematic Mapping Study. In: Yilmaz, M., Clarke, P., Riel, A., Messnarz, R., Greiner, C., Peisl, T. (eds) Systems, Software and Services Process Improvement. EuroSPI 2024. Communications in Computer and Information Science, vol 2179. Springer, Cham. https://doi.org/10.1007/978-3-031-71139-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-71139-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71138-1
Online ISBN: 978-3-031-71139-8
eBook Packages: Computer ScienceComputer Science (R0)