Skip to main content

Large Language Models for Software Engineering: A Systematic Mapping Study

  • Conference paper
  • First Online:
Systems, Software and Services Process Improvement (EuroSPI 2024)

Abstract

In this research, we aim to conduct a systematic mapping study on Large Language Models (LLMs) for Software Engineering (SE). The significantly enhanced capabilities of LLMs have led to their use in many fields, including the important domain of SE. SE processes involve numerous artifacts, such as code, requirements, and documentation, which can serve as input to LLMs. To determine the potential applications of LLMs in SE, it’s crucial to understand their capabilities. Therefore, this systematic mapping study will explore the capabilities and potential of LLMs in SE tasks. Additionally, this research will address issues associated with LLMs, such as their non-deterministic nature and the problem of hallucinations. It will serve as a valuable resource for software developers, researchers, and practitioners interested in the intersection of artificial intelligence and SE, guiding their decisions on integrating these technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Achiam, J., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  2. Ahmad, W., Chakraborty, S., Ray, B., Chang, K.W.: Unified pre-training for program understanding and generation. In: Toutanova, K., et al. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2655–2668. Association for Computational Linguistics, Online, June 2021. https://doi.org/10.18653/v1/2021.naacl-main.211, https://aclanthology.org/2021.naacl-main.211

  3. Alexander, F., Abdiwijaya, E.A., Pherry, F., Gunawan, A.A.S., et al.: Systematic literature review on solving competitive programming problem with artificial intelligence (ai). In: 2022 1st International Conference on Software Engineering and Information Technology (ICoSEIT), pp. 85–90. IEEE (2022)

    Google Scholar 

  4. Allal, L.B., et al.: Santacoder: don’t reach for the stars! Deep Learning for Code (DL4C) Workshop (2023). https://par.nsf.gov/biblio/10416454

  5. Athiwaratkun, B., et al.: Multi-lingual evaluation of code generation models. arXiv preprint arXiv:2210.14868 (2022)

  6. Austin, J., et al.: Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021)

  7. Bellagente, M., et al.: Stable lm 2 1.6 b technical report. arXiv preprint arXiv:2402.17834 (2024)

  8. Black, S., et al.: GPT-NeoX-20B: an open-source autoregressive language model. In: Fan, A., Ilic, S., Wolf, T., Gallé, M. (eds.) Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pp. 95–136. Association for Computational Linguistics, virtual+Dublin, May 2022. https://doi.org/10.18653/v1/2022.bigscience-1.9, https://aclanthology.org/2022.bigscience-1.9

  9. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  10. Cassano, F., et al.: Multipl-e: a scalable and polyglot approach to benchmarking neural code generation. IEEE Trans. Softw. Eng. (2023)

    Google Scholar 

  11. Chai, Y., Wang, S., Pang, C., Sun, Y., Tian, H., Wu, H.: ERNIE-code: beyond English-centric cross-lingual pretraining for programming languages. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, pp. 10628–10650. Association for Computational Linguistics, Toronto, Canada, July 2023. https://doi.org/10.18653/v1/2023.findings-acl.676, https://aclanthology.org/2023.findings-acl.676

  12. Chandel, S., Clement, C.B., Serrato, G., Sundaresan, N.: Training and evaluating a jupyter notebook data science assistant. arXiv preprint arXiv:2201.12901 (2022)

  13. Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)

  14. Chiang, W.L., et al.: Vicuna: an open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. https://lmsys.org/blog/2023-03-30-vicuna/

  15. Chowdhery, A., et al.: Palm: scaling language modeling with pathways. J. Mach. Learn. Res. 24(240), 1–113 (2023)

    Google Scholar 

  16. Christopoulou, F., et al.: Pangu-coder: Program synthesis with function-level language modeling. arXiv preprint arXiv:2207.11280 (2022)

  17. Clement, C., Drain, D., Timcheck, J., Svyatkovskiy, A., Sundaresan, N.: PyMT5: multi-mode translation of natural language and python code with transformers. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9052–9065. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.emnlp-main.728, https://aclanthology.org/2020.emnlp-main.728

  18. Desai, A., Deo, A.: Introducing amazon codewhisperer, the ml-powered coding companion (2022). https://aws.amazon.com/blogs/machine-learning/introducing-amazon-codewhisperer-the-ml-powered-coding-companion/. Accessed 22 Feb 2024

  19. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (Jun 2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  20. Ebert, C., Louridas, P.: Generative ai for software practitioners. IEEE Softw. 40(4), 30–38 (2023)

    Article  Google Scholar 

  21. Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., Zhang, J.M.: Large language models for software engineering: survey and open problems. In: 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). pp. 31–53. IEEE Computer Society, Los Alamitos, CA, USA, May 2023. https://doi.org/10.1109/ICSE-FoSE59343.2023.00008, https://doi.ieeecomputersociety.org/10.1109/ICSE-FoSE59343.2023.00008

  22. Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1536–1547. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.139, https://aclanthology.org/2020.findings-emnlp.139

  23. Fried, D., et al.: Incoder: a generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999 (2022)

  24. Gunasekar, S., et al.: Textbooks are all you need. arXiv preprint arXiv:2306.11644 (2023)

  25. Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J.: UniXcoder: unified cross-modal pre-training for code representation. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7212–7225. Association for Computational Linguistics, Dublin, Ireland, May 2022. https://doi.org/10.18653/v1/2022.acl-long.499, https://aclanthology.org/2022.acl-long.499

  26. Guo, D., et al.: Graphcodebert: pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020)

  27. Hendrycks, D., Basart, S., Kadavath, S., Mazeika, M., Arora, A., Guo, E., Burns, C., Puranik, S., He, H., Song, D., Steinhardt, J.: Measuring coding challenge competence with apps. NeurIPS (2021)

    Google Scholar 

  28. Hort, M., Grishina, A., Moonen, L.: An exploratory literature study on sharing and energy use of language models for source code. In: 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–12. IEEE (2023)

    Google Scholar 

  29. Hou, X., et al.: Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620 (2024)

  30. Jin, X., Larson, J., Yang, W., Lin, Z.: Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models (2023)

    Google Scholar 

  31. Kanade, A., Maniatis, P., Balakrishnan, G., Shi, K.: Learning and evaluating contextual embedding of source code. In: International Conference on Machine Learning, pp. 5110–5121. PMLR (2020)

    Google Scholar 

  32. Kudo, T.: Subword regularization: Improving neural network translation models with multiple subword candidates. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 66–75. Association for Computational Linguistics, Melbourne, Australia, July 2018. https://doi.org/10.18653/v1/P18-1007, https://aclanthology.org/P18-1007

  33. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)

  34. Le, H., Wang, Y., Gotmare, A.D., Savarese, S., Hoi, S.C.H.: Coderl: mastering code generation through pretrained models and deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 35, 21314–21328 (2022)

    Google Scholar 

  35. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.703, https://aclanthology.org/2020.acl-main.703

  36. Li, R., et al.: Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023)

  37. Li, X., et al.: Coderetriever: a large scale contrastive pre-training method for code search. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2898–2910 (2022)

    Google Scholar 

  38. Li, Y., et al.: Competition-level code generation with alphacode. Science 378(6624), 1092–1097 (2022)

    Google Scholar 

  39. Lin, J., Liu, Y., Zeng, Q., Jiang, M., Cleland-Huang, J.: Traceability transformed: Generating more accurate links with pre-trained bert models. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 324–335. IEEE (2021)

    Google Scholar 

  40. Lu, S., e al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. CoRR abs/2102.04664 (2021)

    Google Scholar 

  41. Luo, X., Xue, Y., Xing, Z., Sun, J.: Prcbert: rompt learning for requirement classification using bert-based pretrained language models. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–13 (2022)

    Google Scholar 

  42. Luo, Z., et al.: Wizardcoder: empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568 (2023)

  43. Mielke, S.J., et al.: Between words and characters: A brief history of open-vocabulary modeling and tokenization in nlp. Computing Research Repository (arXiv) (2021). https://par.nsf.gov/biblio/10347731

  44. Naveed, H., et al.: A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435 (2023)

  45. Nijkamp, E., Hayashi, H., Xiong, C., Savarese, S., Zhou, Y.: Codegen2: lessons for training llms on programming and natural languages. ICLR (2023)

    Google Scholar 

  46. Nijkamp, E., et al.: Codegen: an open large language model for code with multi-turn program synthesis. ICLR (2023)

    Google Scholar 

  47. Niu, C., Li, C., Ng, V., Ge, J., Huang, L., Luo, B.: Spt-code: sequence-to-sequence pre-training for learning source code representations. In: Proceedings of the 44th International Conference on Software Engineering, pp. 2006–2018 (2022)

    Google Scholar 

  48. Nyberg, E.P., et al.: Bard: a structured technique for group elicitation of bayesian networks to support analytic reasoning. Risk Anal. 42(6), 1155–1178 (2022)

    Article  Google Scholar 

  49. Ouyang, L., et al.: Training language models to follow instructions with human feedback (2022)

    Google Scholar 

  50. Ouyang, S., Zhang, J.M., Harman, M., Wang, M.: Llm is like a box of chocolates: the non-determinism of chatgpt in code generation. arXiv preprint arXiv:2308.02828 (2023)

  51. Ozkaya, I.: Application of large language models to software engineering tasks: opportunities, risks, and implications. IEEE Softw. 40(3), 4–8 (2023)

    Article  Google Scholar 

  52. Ozkaya, I.: Can architecture knowledge guide software development with generative ai? IEEE Softw. 40(5), 4–8 (2023)

    Article  Google Scholar 

  53. Ozkaya, I.: The next frontier in software development: Ai-augmented software development processes. IEEE Softw. 40(4), 4–9 (2023)

    Article  Google Scholar 

  54. Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: a roadmap. IEEE Trans. Knowl. Data Eng. (2024)

    Google Scholar 

  55. Phan, L., et al.: CoTexT: multi-task learning with code-text transformer. In: Lachmy, R., et al. (eds.) Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021), pp. 40–47. Association for Computational Linguistics, Online, August 2021. https://doi.org/10.18653/v1/2021.nlp4prog-1.5, https://aclanthology.org/2021.nlp4prog-1.5

  56. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  57. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  58. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)

    MathSciNet  Google Scholar 

  59. Schuster, M., Nakajima, K.: Japanese and korean voice search. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5149–5152. IEEE (2012)

    Google Scholar 

  60. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Erk, K., Smith, N.A. (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin, Germany, August 2016. https://doi.org/10.18653/v1/P16-1162, https://aclanthology.org/P16-1162

  61. Shen, B., et al.: Pangu-coder2: Boosting large language models for code with ranking feedback. arXiv preprint arXiv:2307.14936 (2023)

  62. Srivastava, A., et al.: Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615 (2022)

  63. Tabassum, J., Maddela, M., Xu, W., Ritter, A.: Code and named entity recognition in StackOverflow. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics pp. 4913–4926. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.443, https://aclanthology.org/2020.acl-main.443

  64. Thoppilan, R., et al.: Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022)

  65. Touvron, H., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  66. Trautsch, A., Herbold, S.: Predicting issue types with sebert. In: Proceedings of the 1st International Workshop on Natural Language-based Software Engineering, pp. 37–39 (2022)

    Google Scholar 

  67. Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)

    Google Scholar 

  68. Wang, B.: Mesh-Transformer-JAX: Model-Parallel Implementation of Transformer Language Model with JAX. https://github.com/kingoflolz/mesh-transformer-jax, May 2021

  69. Wang, S., Huang, L., Gao, A., Ge, J., Zhang, T., Feng, H., Satyarth, I., Li, M., Zhang, H., Ng, V.: Machine/deep learning for software engineering: a systematic literature review. IEEE Trans. Software Eng. 49(3), 1188–1231 (2022)

    Article  Google Scholar 

  70. Wang, Y., Le, H., Gotmare, A., Bui, N., Li, J., Hoi, S.: CodeT5+: Open code large language models for code understanding and generation. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 1069–1088. Association for Computational Linguistics, Singapore, December 2023. https://doi.org/10.18653/v1/2023.emnlp-main.68, https://aclanthology.org/2023.emnlp-main.68

  71. Wang, Y., Wang, W., Joty, S., Hoi, S.C.: CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (Nov 2021). https://doi.org/10.18653/v1/2021.emnlp-main.685, https://aclanthology.org/2021.emnlp-main.685

  72. Wang, Z., Zhou, S., Fried, D., Neubig, G.: Execution-based evaluation for open-domain code generation. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 1271–1290. Association for Computational Linguistics, Singapore (Dec 2023). https://doi.org/10.18653/v1/2023.findings-emnlp.89, https://aclanthology.org/2023.findings-emnlp.89

  73. Xu, F.F., Alon, U., Neubig, G., Hellendoorn, V.J.: A systematic evaluation of large language models of code. In: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, pp. 1–10 (2022)

    Google Scholar 

  74. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019)

    Google Scholar 

  75. Zan, D., Chen, B., Lin, Z., Guan, B., Yongji, W., Lou, J.G.: When language model meets private library. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 277–288. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, December 2022. https://doi.org/10.18653/v1/2022.findings-emnlp.21, https://aclanthology.org/2022.findings-emnlp.21

  76. Zan, D., et al.: Cert: continual pre-training on sketches for library-oriented code generation. In: Raedt, L.D. (ed.) Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 2369–2375. International Joint Conferences on Artificial Intelligence Organization (7 2022). https://doi.org/10.24963/ijcai.2022/329, https://doi.org/10.24963/ijcai.2022/329

  77. Zan, D., et al.: Large language models meet NL2Code: a survey. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7443–7464. Association for Computational Linguistics, Toronto, Canada, July 2023. https://doi.org/10.18653/v1/2023.acl-long.411, https://aclanthology.org/2023.acl-long.411

  78. Zhang, J., Panthaplackel, S., Nie, P., Li, J.J., Gligoric, M.: Coditt5: pretraining for source code and natural language editing. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–12 (2022)

    Google Scholar 

  79. Zhang, Q., et al.: A survey on large language models for software engineering. arXiv preprint arXiv:2312.15223 (2023)

  80. Zhang, S., et al.: Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022)

  81. Zhao, S.: Github copilot now has a better ai model and new capabilities (2023). https://github.blog/2023-02-14-github-copilot-now-has-a-better-ai-model-and-new-capabilities. Accessed 25 Oct 2023

  82. Zhao, W.X., et al.: A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)

  83. Zheng, Q., et al.: Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 5673–5684 (2023)

    Google Scholar 

  84. Zhuang, L., Wayne, L., Ya, S., Jun, Z.: A robustly optimized BERT pre-training approach with post-training. In: Li, S., Sun, M., Liu, Y., Wu, H., Liu, K., Che, W., He, S., Rao, G. (eds.) Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 1218–1227. Chinese Information Processing Society of China, Huhhot, China, August 2021. https://aclanthology.org/2021.ccl-1.108

Download references

Acknowledgment

This research is supported in part by SFI, Science Foundation Ireland (https://www.sfi.ie/) grant No SFI 13/RC/2094_P2 to Lero - the Science Foundation Ireland Research Centre for Software.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammet Kürşat Görmez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Görmez, M.K., Yılmaz, M., Clarke, P.M. (2024). Large Language Models for Software Engineering: A Systematic Mapping Study. In: Yilmaz, M., Clarke, P., Riel, A., Messnarz, R., Greiner, C., Peisl, T. (eds) Systems, Software and Services Process Improvement. EuroSPI 2024. Communications in Computer and Information Science, vol 2179. Springer, Cham. https://doi.org/10.1007/978-3-031-71139-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-71139-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-71138-1

  • Online ISBN: 978-3-031-71139-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics