Training Language Models for Programming Feedback Using Automated Repair Tools

Koutcheme, Charles

doi:10.1007/978-3-031-36272-9_79

Charles Koutcheme ORCID: orcid.org/0000-0002-2272-2763¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13916))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

3290 Accesses

Abstract

In introductory programming courses, automated repair tools (ARTs) are used to provide feedback to students struggling with debugging. Most successful ARTs take advantage of context-specific educational data to construct repairs to students’ buggy codes. Recent work in student program repair using large language models (LLMs) has also started to utilize such data. An underexplored area in this field is the use of ARTs in combination with LLMs. In this paper, we propose to transfer the repairing capabilities of existing ARTs to open large language models by finetuning LLMs on ART corrections to buggy codes. We experiment with this approach using three large datasets of Python programs written by novices. Our results suggest that a finetuned LLM provides more reliable and higher-quality repairs than the repair tool used for finetuning the model. This opens venues for further deploying and using educational LLM-based repair techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Azcona, D., Smeaton, A.: +5 Million Python & Bash Programming Submissions for 5 Courses & Grades for Computer-Based Exams Over 3 Academic Years (2020). https://doi.org/10.6084/m9.figshare.12610958.v1
Chen, M., et al.: Evaluating large language models trained on code (2021). https://doi.org/10.48550/ARXIV.2107.03374
Cleuziou, G., Flouvat, F.: Learning student program embeddings using abstract execution traces. In: 14th International Conference on Educational Data Mining, pp. 252–262 (2021)
Google Scholar
Gulwani, S., Radiček, I., Zuleger, F.: Automated clustering and program repair for introductory programming assignments (2018). http://arxiv.org/abs/1603.03165, arXiv:1603.03165 [cs]
Hu, Y., Ahmed, U.Z., Mechtaev, S., Leong, B., Roychoudhury, A.: Re-factoring based program repair applied to programming assignments. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 388–398. IEEE/ACM (2019)
Google Scholar
McCauley, R., et al.: Debugging: a review of the literature from an educational perspective. Comput. Sci. Educ. 18(2), 67–92 (2008). https://doi.org/10.1080/08993400802114581
Article Google Scholar
Pu, Y., Narasimhan, K., Solar-Lezama, A., Barzilay, R.: sk_p: a neural program corrector for MOOCs. arXiv:1607.02902 [cs] (2016). http://arxiv.org/abs/1607.02902
Singh, R., Gulwani, S., Solar-Lezama, A.: Automated feedback generation for introductory programming assignments. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 15–26. PLDI 2013, Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2491956.2462195
Wang, K., Singh, R., Su, Z.: Data-driven feedback generation for introductory programming exercises. arXiv:1711.07148 [cs] (2017). http://arxiv.org/abs/1711.07148
Wang, Y., Wang, W., Joty, S.R., Hoi, S.C.H.: Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: EMNLP, pp. 8696–8708. Association for Computational Linguistics (2021)
Google Scholar
Zhang, J., et al.: Repairing bugs in python assignments using large language models (2022). https://doi.org/10.48550/ARXIV.2209.14876

Download references

Author information

Authors and Affiliations

Aalto University, Espoo, Finland
Charles Koutcheme

Authors

Charles Koutcheme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charles Koutcheme .

Editor information

Editors and Affiliations

University of Southern California, Los Angeles, CA, USA
Ning Wang
University of British Columbia, Vancouver, BC, Canada
Genaro Rebolledo-Mendez
North Carolina State University, Raleigh, NC, USA
Noboru Matsuda
Despacho 3.01, UNED-Grupo de Investigación aDeNu, Madrid, Spain
Olga C. Santos
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koutcheme, C. (2023). Training Language Models for Programming Feedback Using Automated Repair Tools. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2023. Lecture Notes in Computer Science(), vol 13916. Springer, Cham. https://doi.org/10.1007/978-3-031-36272-9_79

Download citation

DOI: https://doi.org/10.1007/978-3-031-36272-9_79
Published: 26 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36271-2
Online ISBN: 978-3-031-36272-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Training Language Models for Programming Feedback Using Automated Repair Tools