Skip to main content

A Pilot Study on AI-Assisted Code Generation with Large Language Models for Software Engineering

  • Conference paper
  • First Online:
Technologies and Applications of Artificial Intelligence (TAAI 2023)

Abstract

The field of code generation, influenced by deep learning, has become crucial in contemporary software engineering, facilitating the conversion of natural language to executable code. A noticeable knowledge gap exists, prompting an exhaustive examination of the current methodologies and innovations. The primary objective of this research is to offer a thorough literature review, illuminating the current state of deep learning-powered code generation. A rigorous systematic review was employed, wherein 28 influential papers from essential academic databases were recognized. An analytical approach was adopted to discern trends, understand the significance of numbers, and draw meaningful conclusions from the data. These papers were then analyzed using a structured methodology. The study unveils insights into large language model code generation, potentially bridging the prevailing knowledge gap and offering direction for future innovations in the domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)

  2. Xu, F.F., Alon, U., Neubig, G., Hellendoorn, V.J.: A systematic evaluation of large language models of code. In: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, pp. 1–10 (2022)

    Google Scholar 

  3. Hendrycks, D., et al.: Measuring coding challenge competence with APPS. CoRR abs/2105.09938, arXiv preprint arXiv:2105.09938 (2021)

  4. Buscemi, A.: A comparative study of code generation using ChatGPT 3.5 across 10 programming languages. arXiv preprint arXiv:2308.04477 (2023)

  5. Yin, P., Neubig, G.: A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696 (2017)

  6. Bahrami, M., et al.: Pytorrent: a python library corpus for large-scale language models. arXiv preprint arXiv:2110.01710 (2021)

  7. Chowdhery, A., et al.: Palm: scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022)

  8. Kocetkov, D., et al.: The stack: 3 TB of permissively licensed source code. arXiv preprint arXiv:2211.15533 (2022)

  9. Soliman, A.S., Hadhoud, M.M., Shaheen, S.I.: MarianCG: a code generation transformer model inspired by machine translation. J. Eng. Appl. Sci. 69(1), 1–23 (2022)

    Article  Google Scholar 

  10. Manh, D.N., et al.: The vault: a comprehensive multilingual dataset for advancing code understanding and generation. arXiv preprint arXiv:2305.06156 (2023)

  11. Shin, J., Nam, J.: A survey of automatic code generation from natural language. J. Inf. Process. Syst. 17(3), 537–555 (2021)

    Google Scholar 

  12. Yu, T., Gu, X., Shen, B.: Code question answering via task-adaptive sequence-to-sequence pre-training. In: 2022 29th Asia-Pacific Software Engineering Conference (APSEC), pp. 229–238. IEEE (2022)

    Google Scholar 

  13. Khan, M.A.M., Bari, M.S., Do, X.L., Wang, W., Parvez, M.R., Joty, S.: xCodeEval: a large scale multilingual multitask benchmark for code understanding, generation, translation and retrieval. arXiv preprint arXiv:2303.03004 (2023)

  14. Yang, Z., Chen, S., Gao, C., Li, Z., Li, G., Lv, R.: Deep learning based code generation methods: a literature review. arXiv preprint arXiv:2303.01056 (2023)

  15. Zan, D., et al.: CERT: continual pre-training on sketches for library-oriented code generation. arXiv preprint arXiv:2206.06888 (2022)

  16. Drozdova, A., Trofimova, E., Guseva, P., Scherbakova, A., Ustyuzhanin, A.: Code4ML: a large-scale dataset of annotated Machine Learning code. PeerJ Comput. Sci. 9, e1230 (2023)

    Article  Google Scholar 

  17. Shen, B., et al.: PanGu-Coder2: boosting large language models for code with ranking feedback. arXiv preprint arXiv:2307.14936 (2023)

  18. Tranfield, D., Denyer, D., Smart, P.: Towards a methodology for developing evidence-informed management knowledge by means of systematic review. Br. J. Manag.Manag. 14(3), 207–222 (2003)

    Google Scholar 

  19. Du, X., et al.: ClassEval: a manually-crafted benchmark for evaluating LLMs on class-level code generation. arXiv preprint arXiv:2308.01861 (2023)

  20. Shinn, N., Cassano, F., Labash, B., Gopinath, A., Narasimhan, K., Yao, S.: Reflexion: language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366 (2023)

  21. Cassano, F., et al.: MultiPL-E: a scalable and polyglot approach to benchmarking neural code generation. IEEE Trans. Softw. Eng.Softw. Eng. 49(7), 3675–3691 (2023). https://doi.org/10.1109/TSE.2023.3267446

    Article  Google Scholar 

  22. Gunasekar, S., et al.: Textbooks are all you need. arXiv preprint arXiv:2306.11644 (2023)

  23. Page, M.J., et al.: The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int. J. Surg. 88, 105906 (2021)

    Article  Google Scholar 

  24. Siddiq, M.L., Casey, B., Santos, J.: A lightweight framework for high-quality code generation. arXiv preprint arXiv:2307.08220 (2023)

  25. Luo, Z., et al.: WizardCoder: empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568 (2023)

  26. Muennighoff, N., et al.: OctoPack: instruction tuning code large language models. arXiv preprint arXiv:2308.07124 (2023)

  27. Rozière, B., et al.: Code llama: open foundation models for code. arXiv preprint arXiv:2308.12950 (2023)

  28. Zheng, Q., et al.: CodeGeeX: a pre-trained model for code generation with multilingual benchmarking on HumanEval-X. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 5673–5684 (2023)

    Google Scholar 

  29. Nijkamp, E., et al: Codegen: an open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022)

  30. Allal, L.B., et al: SantaCoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988 (2023)

  31. Li, R., et al.: StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023)

  32. Wang, Y., Le, H., Gotmare, A.D., Bui, N.D., Li, J., Hoi, S.C.: Codet5+: open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922 (2023)

  33. Zheng, Q., et al.: Codegeex: a pre-trained model for code generation with multilingual evaluations on humaneval-x. arXiv preprint arXiv:2303.17568 (2023)

Download references

Acknowledgments

This research was supported in part by the National Science and Technology Council (NSTC), Taiwan, under grants MOST 110–2410-H-305–013-MY2 and NSTC 112- 2425-H-305–002-, and National Taipei University (NTPU), Taiwan under grants 112-NTPU-ORDA-F-003, 112- NTPU-ORDA-F-004, and NTPU-112A513E01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min-Yuh Day .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, HC., Tsai, CT., Day, MY. (2024). A Pilot Study on AI-Assisted Code Generation with Large Language Models for Software Engineering. In: Lee, CY., Lin, CL., Chang, HT. (eds) Technologies and Applications of Artificial Intelligence. TAAI 2023. Communications in Computer and Information Science, vol 2074. Springer, Singapore. https://doi.org/10.1007/978-981-97-1711-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-1711-8_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-1710-1

  • Online ISBN: 978-981-97-1711-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics