Skip to main content

Fine-Tuning Large Language Models for Answering Programming Questions with Code Snippets

  • Conference paper
  • First Online:
Computational Science – ICCS 2023 (ICCS 2023)

Abstract

We study the ability of pretrained large language models (LLM) to answer questions from online question answering fora such as Stack Overflow. We consider question-answer pairs where the main part of the answer consists of source code. On two benchmark datasets—CoNaLa and a newly collected dataset based on Stack Overflow—we investigate how a closed-book question answering system can be improved by fine-tuning the LLM for the downstream task, prompt engineering, and data preprocessing. We use publicly available autoregressive language models such as GPT-Neo, CodeGen, and PanGu-Coder, and after the proposed fine-tuning achieve a BLEU score of 0.4432 on the CoNaLa test set, significantly exceeding previous state of the art for this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://stackoverflow.com/.

  2. 2.

    https://conala-corpus.github.io/.

  3. 3.

    https://data.stackexchange.com/.

  4. 4.

    https://paperswithcode.com/sota/code-generation-on-conala.

References

  1. Beau, N., Crabbé, B.: The impact of lexical and grammatical processing on generating code from natural language. In: Findings of the Association for Computational Linguistics: ACL 2022, pp. 2204–2214. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.findings-acl.173

  2. Beyer, S., Macho, C., Di Penta, M., Pinzger, M.: What kind of questions do developers ask on Stack Overflow? A comparison of automated approaches to classify posts into question categories. Empir. Softw. Eng. 25(3), 2258–2301 (2019). https://doi.org/10.1007/s10664-019-09758-x

    Article  Google Scholar 

  3. Black, S., Gao, L., Wang, P., Leahy, C., Biderman, S.: GPT-Neo: large Scale autoregressive language modeling with mesh-tensorflow (2021). https://doi.org/10.5281/zenodo.5297715

  4. Brown, T.B. et al.: Language models are few-shot learners (2020). https://doi.org/10.48550/ARXIV.2005.14165

  5. Chen, M. et al.: Evaluating large language models trained on code. CoRR abs/2107.03374 (2021), https://arxiv.org/abs/2107.03374

  6. Christopoulou, F. et al.: PanGu-Coder: program synthesis with function-level language modeling (2022). https://doi.org/10.48550/ARXIV.2207.11280

  7. Ding, N. et al.: Openprompt: an open-source framework for prompt-learning. arXiv preprint arXiv:2111.01998 (2021)

  8. Evtikhiev, M., Bogomolov, E., Sokolov, Y., Bryksin, T.: Out of the bleu: how should we assess quality of the code generation models? (2022). https://doi.org/10.48550/ARXIV.2208.03133

  9. Gao, L., et al.: The pile: An 800 GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027 (2020)

  10. Hall, P., Hart, J.D.: Bootstrap test for difference between means in nonparametric regression. J. Am. Statist. Assoc. 85(412), 1039–1049 (1990). https://doi.org/10.1080/01621459.1990.10474974

  11. Hendrycks, D., Basart, S., Kadavath, S., Mazeika, M., Arora, A., Guo, E., Burns, C., Puranik, S., He, H., Song, D., Steinhardt, J.: Measuring coding challenge competence with apps (2021). https://doi.org/10.48550/ARXIV.2105.09938

  12. Kovalchuk, S.V., Lomshakov, V., Aliev, A.: Human perceiving behavior modeling in evaluation of code generation models. In: Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pp. 287–294. ACL, Abu Dhabi, UAE (2022). https://aclanthology.org/2022.gem-1.24

  13. Lee, N., Li, B.Z., Wang, S., Yih, W.T., Ma, H., Khabsa, M.: Language models as fact checkers? (2020). https://doi.org/10.48550/ARXIV.2006.04102

  14. Li, Y. et al.: Competition-level code generation with alphacode (2022). https://doi.org/10.48550/ARXIV.2203.07814

  15. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain (2004). https://aclanthology.org/W04-1013

  16. Nijkamp, E. et al.: CodeGen: an open large language model for code with multi-turn program synthesis (2022). https://doi.org/10.48550/ARXIV.2203.13474

  17. Norouzi, S., Cao, Y.: Semantic parsing with less prior and more monolingual data. CoRR abs/2101.00259 (2021). https://arxiv.org/abs/2101.00259

  18. Petroni, F. et al.: Language models as knowledge bases? (2019). https://doi.org/10.48550/ARXIV.1909.01066

  19. Ren, S. et al.: CodeBLEU: a method for automatic evaluation of code synthesis (2020). https://doi.org/10.48550/ARXIV.2009.10297

  20. Roberts, A., Raffel, C., Shazeer, N.: How much knowledge can you pack into the parameters of a language model? (2020). https://doi.org/10.48550/ARXIV.2002.08910

  21. Soliman, A.S., Hadhoud, M.M., Shaheen, S.I.: MarianCG: a code generation transformer model inspired by machine translation. J. Eng. Appl. Sci. 69(1), 1–23 (2022)

    Article  Google Scholar 

  22. Song, K., Tan, X., Qin, T., Lu, J., Liu, T.Y.: MPNet: masked and permuted pre-training for language understanding. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS2020, Curran Associates Inc., Red Hook, NY, USA (2020)

    Google Scholar 

  23. Tran, N., Tran, H., Nguyen, S., Nguyen, H., Nguyen, T.: Does BLEU score work for code migration? In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 165–176 (2019)

    Google Scholar 

  24. Xu, F.F., Alon, U., Neubig, G., Hellendoorn, V.J.: A systematic evaluation of large language models of code (2022). https://doi.org/10.48550/ARXIV.2202.13169

  25. Ye, Q. et al.: Studying strategically: learning to mask for closed-book QA (2020). https://doi.org/10.48550/ARXIV.2012.15856

  26. Yin, P., Deng, B., Chen, E., Vasilescu, B., Neubig, G.: Learning to mine aligned code and natural language pairs from stack overflow. In: 2018 IEEE/ACM 15th Intl. Conf. on Mining Software Repositories (MSR), pp. 476–486 (2018)

    Google Scholar 

  27. Yin, P., Neubig, G.: TRANX: A transition-based neural abstract syntax parser for semantic parsing and code generation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 7–12. ACL, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-2002

  28. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: Evaluating text generation with BERT (2019). https://doi.org/10.48550/ARXIV.1904.09675

Download references

Acknowledgements

The work of Sergey Nikolenko was prepared in the framework of the strategic project “Digital Business” within the Strategic Academic Leadership Program “Priority 2030” at NUST MISiS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vadim Lomshakov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lomshakov, V., Kovalchuk, S., Omelchenko, M., Nikolenko, S., Aliev, A. (2023). Fine-Tuning Large Language Models for Answering Programming Questions with Code Snippets. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14074. Springer, Cham. https://doi.org/10.1007/978-3-031-36021-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36021-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36020-6

  • Online ISBN: 978-3-031-36021-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics