research-article

Open Access

Evaluating Distance Measures for Program Repair

Authors:
Charles Koutcheme

Aalto University, Finland

Aalto University, Finland

0000-0002-2272-2763
View Profile

,
Sami Sarsa

Aalto University, Finland

Aalto University, Finland

0000-0002-7277-9282
View Profile

,
Juho Leinonen

The University of Auckland, New Zealand

The University of Auckland, New Zealand

0000-0001-6829-9449
View Profile

,
Lassi Haaranen

Aalto University, Finland

Aalto University, Finland

0000-0002-6500-6425
View Profile

,
Arto Hellas

Aalto University, Finland

Aalto University, Finland

0000-0001-6502-209X
View Profile

ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1August 2023Pages 495–507https://doi.org/10.1145/3568813.3600130

Published:10 September 2023Publication History

ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1

Pages 495–507

ABSTRACT

Background and Context: Struggling with programming assignments while learning to program is a common phenomenon in programming courses around the world. Supporting struggling students is a common theme in Computing Education Research (CER), where a wide variety of support methods have been created and evaluated. An important stream of research here focuses on program repair, where methods for automatically fixing erroneous code are used for supporting students as they debug their code. Work in this area has so far assessed the performance of the methods by evaluating the closeness of the proposed fixes to the original erroneous code. The evaluations have mainly relied on the use of edit distance measures such as the sequence edit distance and there is a lack of research on which distance measure is the most appropriate.

Objectives: Provide insight into measures for quantifying the distance between erroneous code written by a student and a proposed change. We conduct the evaluation in an introductory programming context, where insight into the distance measures can provide help in choosing a suitable metric that can inform which fixes should be suggested to novices.

Method: A team of five experts annotated a subset of the Dublin dataset, creating solutions for over a thousand erroneous programs written by students. We evaluated how the prominent edit distance measures from the CER literature compare against measures used in Natural Language Processing (NLP) tasks for retrieving the experts’ solutions from a pool of proposed solutions. We also evaluated how the expert-generated solutions compare against the solutions proposed by common program repair algorithms. The annotated dataset and the evaluation code are published as part of the work.

Findings: Our results highlight that the ROUGE score, classically used for evaluating the performance of machine summarization tasks, performs well as an evaluation and selection metric for program repair. We also highlight the practical utility of NLP metrics, which allow an easier interpretation and comparison of the performance of repair techniques when compared to the classic methods used in the CER literature.

Implications: Our study highlights the variety of distance metrics used for comparing source codes. We find issues with the classically used distance measures that can be combated by using NLP metrics. Based on our findings, we recommend including NLP metrics, and in particular, the ROUGE metric, in evaluations when considering new program repair methodologies. We also suggest incorporating NLP metrics into other areas where source codes are compared, including plagiarism detection.

References

Umair Z Ahmed, Pawan Kumar, Amey Karkare, Purushottam Kar, and Sumit Gulwani. 2018. Compilation error repair: for the student programs, from the student programs. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training. 78–87.Google ScholarDigital Library
Kirsti M Ala-Mutka. 2005. A survey of automated assessment approaches for programming assignments. Computer science education 15, 2 (2005), 83–102.Google Scholar
Amjad Altadmri and Neil CC Brown. 2015. 37 million compilations: Investigating novice programming mistakes in large-scale student data. In Proceedings of the 46th ACM technical symposium on computer science education. 522–527.Google ScholarDigital Library
David Azcona, Piyush Arora, I-Han Hsiao, and Alan Smeaton. 2019. user2code2vec: Embeddings for Profiling Students Based on Distributional Representations of Source Code. In Proceedings of the 9th International Learning Analytics & Knowledge Conference (LAK’19). ACM.Google ScholarDigital Library
Brett A Becker. 2016. A new metric to quantify repeated compiler errors for novice programmers. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. 296–301.Google ScholarDigital Library
Brett A Becker, Paul Denny, Raymond Pettit, Durell Bouchard, Dennis J Bouvier, Brian Harrington, Amir Kamil, Amey Karkare, Chris McDonald, Peter-Michael Osera, 2019. Compiler error messages considered unhelpful: The landscape of text-based programming error message research. Proceedings of the working group reports on innovation and technology in computer science education (2019), 177–210.Google ScholarDigital Library
Sahil Bhatia and Rishabh Singh. 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. ArXiv (2016).Google Scholar
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".Google ScholarDigital Library
Neil CC Brown and Amjad Altadmri. 2017. Novice Java programming mistakes: Large-scale data vs. educator beliefs. ACM Transactions on Computing Education (TOCE) 17, 2 (2017), 1–21.Google ScholarDigital Library
Charis Charitsis, Chris Piech, and John C Mitchell. 2023. Detecting the Reasons for Program Decomposition in CS1 and Evaluating Their Impact. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 1014–1020.Google ScholarDigital Library
Guillaume Cleuziou and Frédéric Flouvat. 2021. Learning student program embeddings using abstract execution traces. In Proceedings of the 14th Educational Data Mining conference. https://educationaldatamining.org/EDM2021/virtual/poster_paper70.htmlGoogle Scholar
Paul Denny, Andrew Luxton-Reilly, and Ewan Tempero. 2012. All syntax errors are not equal. In Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education. 75–80.Google ScholarDigital Library
Paul Denny, James Prather, Brett A Becker, Catherine Mooney, John Homer, Zachary C Albrecht, and Garrett B Powell. 2021. On Designing Programming Error Messages for Novices: Readability and Its Constituent Factors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarDigital Library
Paul Denny, Jacqueline Whalley, and Juho Leinonen. 2021. Promoting early engagement with programming assignments using scheduled automated feedback. In Proceedings of the 23rd Australasian Computing Education Conference. 88–95.Google ScholarDigital Library
Christopher Douce, David Livingstone, and James Orwell. 2005. Automatic test-based assessment of programming: A review. Journal on Educational Resources in Computing (JERIC) 5, 3 (2005), 4–es.Google ScholarDigital Library
Benedict Du Boulay. 1986. Some difficulties of learning to program. Journal of Educational Computing Research 2, 1 (1986), 57–73.Google ScholarCross Ref
Thomas Dy and Ma Mercedes Rodrigo. 2010. A detector for non-literal Java errors. In Proceedings of the 10th Koli Calling International Conference on Computing Education Research. 118–122.Google ScholarDigital Library
Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common logic errors made by novice programmers. In Proceedings of the 20th Australasian Computing Education Conference. 83–89.Google ScholarDigital Library
Elena L Glassman, Jeremy Scott, Rishabh Singh, Philip J Guo, and Robert C Miller. 2015. OverCode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI) 22, 2 (2015), 1–35.Google ScholarDigital Library
Google-Research. [n. d.]. Google-Research/Rouge at master · google-research/google-research. https://github.com/google-research/google-research/tree/master/rougeGoogle Scholar
Sumit Gulwani, Ivan Radiček, and Florian Zuleger. 2018. Automated Clustering and Program Repair for Introductory Programming Assignments. http://arxiv.org/abs/1603.03165 arXiv:1603.03165 [cs].Google Scholar
Rahul Gupta, Aditya Kanade, and Shirish Shevade. 2019. Deep Reinforcement Learning for Syntactic Error Repair in Student Programs. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (July 2019), 930–937. https://doi.org/10.1609/aaai.v33i01.3301930 Number: 01.Google ScholarDigital Library
Rahul Gupta, Aditya Kanade, and Shirish Shevade. 2019. Neural Attribution for Semantic Bug-Localization in Student Programs. Curran Associates Inc., Red Hook, NY, USA.Google Scholar
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence 31, 1 (Feb. 2017). https://ojs.aaai.org/index.php/AAAI/article/view/10742 Number: 1.Google ScholarCross Ref
John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77, 1 (2007), 81–112.Google Scholar
Andrew Head, Elena Glassman, Gustavo Soares, Ryo Suzuki, Lucas Figueredo, Loris D’Antoni, and Björn Hartmann. 2017. Writing reusable code feedback at scale with mixed-initiative program synthesis. In Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale. 89–98.Google ScholarDigital Library
Kenny Heinonen, Kasper Hirvikoski, Matti Luukkainen, and Arto Vihavainen. 2014. Using codebrowser to seek differences between novice programmers. In Proceedings of the 45th ACM technical symposium on Computer science education. 229–234.Google ScholarDigital Library
Arto Hellas, Petri Ihantola, Andrew Petersen, Vangel V. Ajanovski, Mirela Gutica, Timo Hynninen, Antti Knutas, Juho Leinonen, Chris Messom, and Soohyun Nam Liao. 2018. Predicting Academic Performance: A Systematic Literature Review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education (Larnaca, Cyprus) (ITiCSE 2018 Companion). Association for Computing Machinery, New York, NY, USA, 175–199. https://doi.org/10.1145/3293881.3295783Google ScholarDigital Library
Yang Hu, Umair Z. Ahmed, Sergey Mechtaev, Ben Leong, and Abhik Roychoudhury. 2019. Re-factoring based Program Repair applied to Programming Assignments. In 2019 34th IEEE/ACM Int. Conf. on Automated Software Engineering (ASE). IEEE/ACM, 388–398.Google ScholarDigital Library
Jonathan Huang, Chris Piech, Andy Nguyen, and Leonidas Guibas. 2013. Syntactic and functional variability of a million code submissions in a machine learning MOOC. AIED 2013 Workshops Proceedings Volume 1009 (Jan. 2013), 25.Google Scholar
Petri Ihantola, Tuukka Ahoniemi, Ville Karavirta, and Otto Seppälä. 2010. Review of recent systems for automatic assessment of programming assignments. In Proceedings of the 10th Koli calling international conference on computing education research. 86–93.Google ScholarDigital Library
Petri Ihantola, Arto Vihavainen, Alireza Ahadi, Matthew Butler, Jürgen Börstler, Stephen H Edwards, Essi Isohanni, Ari Korhonen, Andrew Petersen, Kelly Rivers, 2015. Educational data mining and learning analytics in programming: Literature review and case studies. Proceedings of the 2015 ITiCSE on Working Group Reports (2015), 41–63.Google ScholarDigital Library
Matthew C Jadud. 2005. A first look at novice compilation behaviour using BlueJ. Computer Science Education 15, 1 (2005), 25–40.Google ScholarCross Ref
Matthew C Jadud. 2006. Methods and tools for exploring novice compilation behaviour. In Proceedings of the second international workshop on Computing education research. 73–84.Google ScholarDigital Library
W. Lewis Johnson, Elliot Soloway, Benjamin Cutler, and Steven Draper. 1983. Bug Catalogue: I. Technical Report. Yale University, YaleU/CSD/RR #286.Google Scholar
Shalini Kaleeswaran, Anirudh Santhiar, Aditya Kanade, and Sumit Gulwani. 2016. Semi-Supervised Verified Feedback Generation. arxiv:1603.04584 [cs.SE]Google Scholar
Hieke Keuning, Bastiaan Heeren, and Johan Jeuring. 2014. Strategy-based feedback in a programming tutor. In Proceedings of the computer science education research conference. 43–54.Google ScholarDigital Library
Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2018. A systematic literature review of automated feedback generation for programming exercises. ACM Transactions on Computing Education (TOCE) 19, 1 (2018), 1–43.Google ScholarDigital Library
Tobias Kohn. 2019. The error behind the message: Finding the cause of error messages in python. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 524–530.Google ScholarDigital Library
Teemu Koivisto and Arto Hellas. 2022. Evaluating CodeClusters for Effectively Providing Feedback on Code Submissions. In 2022 IEEE Frontiers in Education Conference (FIE). IEEE, 1–9.Google Scholar
Michael Kölling, Bruce Quig, Andrew Patterson, and John Rosenberg. 2003. The BlueJ system and its pedagogy. Computer Science Education 13, 4 (2003), 249–268.Google ScholarCross Ref
Essi Lahtinen, Kirsti Ala-Mutka, and Hannu-Matti Järvinen. 2005. A study of the difficulties of novice programmers. Acm sigcse bulletin 37, 3 (2005), 14–18.Google Scholar
Juho Leinonen, Paul Denny, and Jacqueline Whalley. 2022. A comparison of immediate and scheduled feedback in introductory programming projects. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 1. 885–891.Google ScholarDigital Library
Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, and Brett A Becker. 2023. Using large language models to enhance programming error messages. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 563–569.Google ScholarDigital Library
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://aclanthology.org/W04-1013Google Scholar
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arxiv:2102.04664 [cs.SE]Google Scholar
Andrew Luxton-Reilly, Ibrahim Albluwi, Brett A Becker, Michail Giannakos, Amruth N Kumar, Linda Ott, James Paterson, Michael James Scott, Judy Sheard, and Claudia Szabo. 2018. Introductory programming: a systematic literature review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education. 55–106.Google ScholarDigital Library
Yana Malysheva and Caitlin Kelleher. 2022. An Algorithm for Generating Explainable Corrections to Student Code. In Proceedings of the 22nd Koli Calling International Conference on Computing Education Research (Koli, Finland) (Koli Calling ’22). Association for Computing Machinery, New York, NY, USA, Article 13, 11 pages. https://doi.org/10.1145/3564721.3564731Google ScholarDigital Library
Samiha Marwan, Nicholas Lytle, Joseph Jay Williams, and Thomas Price. 2019. The impact of adding textual explanations to next-step hints in a novice programming environment. In Proceedings of the 2019 ACM conference on innovation and technology in computer science education. 520–526.Google ScholarDigital Library
Jessica McBroom, Irena Koprinska, and Kalina Yacef. 2021. A survey of automated programming hint generation: The hints framework. ACM Computing Surveys (CSUR) 54, 8 (2021), 1–27.Google ScholarDigital Library
Davin McCall and Michael Kölling. 2014. Meaningful categorisation of novice programmer errors. In 2014 IEEE Frontiers in Education Conference (FIE) Proceedings. IEEE, 1–8.Google ScholarCross Ref
Davin McCall and Michael Kölling. 2019. A new look at novice programmer errors. ACM Transactions on Computing Education (TOCE) 19, 4 (2019), 1–30.Google ScholarDigital Library
Andy Nguyen, Christopher Piech, Jonathan Huang, and Leonidas Guibas. 2014. Codewebs: scalable homework search for massive open online programming courses. In Proceedings of the 23rd international conference on World wide web. 491–502.Google ScholarDigital Library
Henrik Nygren, Juho Leinonen, and Arto Hellas. 2019. Non-restricted Access to Model Solutions: A Good Idea?. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 44–50.Google ScholarDigital Library
Benjamin Paaßen, Bassam Mokbel, and Barbara Hammer. 2015. A Toolbox for Adaptive Sequence Dissimilarity Measures for Intelligent Tutoring Systems. In Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015) (2015-06), Olga Christina Santos, Jesus Gonzalez Boticario, Cristobal Romero, Mykola Pechenizkiy, Agathe Merceron, Piotr Mitros, Jose Maria Luna, Christian Mihaescu, Pablo Moreno, Arnon Hershkovitz, Sebastian Ventura, and Michel Desmarais (Eds.). International Educational Datamining Society, 632–632. http://www.educationaldatamining.org/EDM2015/uploads/papers/paper_257.pdfGoogle Scholar
José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated Assessment in Computer Science Education: A State-of-the-Art Review. ACM Transactions on Computing Education (TOCE) (2022).Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318. https://doi.org/10.3115/1073083.1073135Google ScholarDigital Library
Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. arXiv preprint arXiv:2302.04662 (2023).Google Scholar
Chris Piech, Mehran Sahami, Jonathan Huang, and Leonidas Guibas. 2015. Autonomously generating hints by inferring problem solving policies. In Proceedings of the second (2015) acm conference on learning@ scale. 195–204.Google ScholarDigital Library
Thomas W Price, Yihuan Dong, Rui Zhi, Benjamin Paaßen, Nicholas Lytle, Veronica Cateté, and Tiffany Barnes. 2019. A comparison of the quality of data-driven programming hint generation algorithms. International Journal of Artificial Intelligence in Education 29 (2019), 368–395.Google ScholarCross Ref
Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. sk_p: a neural program corrector for MOOCs. arXiv:1607.02902 [cs] (July 2016). http://arxiv.org/abs/1607.02902 arXiv:1607.02902.Google Scholar
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arxiv:2009.10297 [cs.SE]Google Scholar
Kelly Rivers and Kenneth R Koedinger. 2013. Automatic generation of programming feedback: A data-driven approach. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), Vol. 50. 50–59.Google Scholar
Kelly Rivers and Kenneth R Koedinger. 2014. Automating hint generation with solution space path construction. In Intelligent Tutoring Systems: 12th International Conference, ITS 2014, Honolulu, HI, USA, June 5-9, 2014. Proceedings 12. Springer, 329–339.Google ScholarDigital Library
Kelly Rivers and Kenneth R Koedinger. 2017. Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. International Journal of Artificial Intelligence in Education 27 (2017), 37–64.Google ScholarCross Ref
Anthony Robins, Patricia Haden, and Sandy Garner. 2006. Problem distributions in a CS1 course. In Proceedings of the 8th Australasian Conference on Computing Education-Volume 52. 165–173.Google ScholarDigital Library
Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, and José Nelson Amaral. 2018. Syntax and sensibility: Using language models to detect and correct syntax errors. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 311–322.Google ScholarCross Ref
Otto Seppälä, Petri Ihantola, Essi Isohanni, Juha Sorva, and Arto Vihavainen. 2015. Do we know how difficult the rainfall problem is?. In Proceedings of the 15th Koli Calling Conference on Computing Education Research. 87–96.Google ScholarDigital Library
Simon, Oscar Karnalim, Judy Sheard, Ilir Dema, Amey Karkare, Juho Leinonen, Michael Liut, and Renée McCauley. 2020. Choosing code segments to exclude from code similarity detection. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. 1–19.Google Scholar
Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. 2013. Automated feedback generation for introductory programming assignments. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI ’13). Association for Computing Machinery, New York, NY, USA, 15–26. https://doi.org/10.1145/2491956.2462195Google ScholarDigital Library
Rebecca Smith and Scott Rixner. 2019. The error landscape: Characterizing the mistakes of novice programmers. In Proceedings of the 50th ACM technical symposium on computer science education. 538–544.Google ScholarDigital Library
Elliot Soloway. 1986. Learning to program= learning to construct mechanisms and explanations. Commun. ACM 29, 9 (1986), 850–858.Google ScholarDigital Library
Elliot Soloway, Jeffrey G. Bonar, and Kate Ehrlich. 1983. Cognitive strategies and looping constructs: An empirical study. Commun. ACM 26, 11 (1983), 853–860. https://doi.org/10.1145/182.358436Google ScholarDigital Library
Elliot Soloway, Kate Ehrlich, Jeffrey G. Bonar, and Judith Greenspan. 1982. What do novices know about programming? In Directions in Human–Computer Interactions, Albert Badre and Ben Shneiderman (Eds.). Vol. 6. Ablex Publishing, 27–54.Google Scholar
James C Spohrer and Elliot Soloway. 1986. Novice mistakes: Are the folk wisdoms correct?Commun. ACM 29, 7 (1986), 624–632.Google ScholarDigital Library
Arto Vihavainen, Juha Helminen, and Petri Ihantola. 2014. How novices tackle their first lines of code in an ide: Analysis of programming session traces. In Proceedings of the 14th koli calling international conference on computing education research. 109–116.Google ScholarDigital Library
Arto Vihavainen, Thomas Vikberg, Matti Luukkainen, and Martin Pärtel. 2013. Scaffolding students’ learning using test my code. In Proceedings of the 18th ACM conference on Innovation and technology in computer science education. 117–122.Google ScholarDigital Library
Ke Wang, RIshabh Singh, and Zhendong Su. 2017. Data-Driven Feedback Generation for Introductory Programming Exercises. arXiv:1711.07148 [cs] (Nov. 2017). http://arxiv.org/abs/1711.07148 arXiv:1711.07148.Google Scholar
Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Dynamic Neural Program Embedding for Program Repair. https://doi.org/10.48550/arXiv.1711.07163 arXiv:1711.07163 [cs].Google Scholar
Christopher Watson, Frederick WB Li, and Jamie L Godwin. 2013. Predicting performance in an introductory programming course by logging and analyzing student programming behavior. In 2013 IEEE 13th international conference on advanced learning technologies. IEEE, 319–323.Google ScholarDigital Library
Jacqueline Whalley, Amber Settle, and Andrew Luxton-Reilly. 2021. Novice reflections on debugging. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 73–79.Google ScholarDigital Library
Jialu Zhang, José Cambronero, Sumit Gulwani, Vu Le, Ruzica Piskac, Gustavo Soares, and Gust Verbruggen. 2022. Repairing Bugs in Python Assignments Using Large Language Models. https://doi.org/10.48550/ARXIV.2209.14876Google Scholar

Index Terms

Evaluating Distance Measures for Program Repair

Recommendations

RepairNet: Contextual Sequence-to-Sequence Network for Automated Program Repair
Artificial Intelligence in Education
Abstract
Compile-time errors can wreak havoc for programmers – seasoned and novice. Often developers spend a lot of time debugging them. An automated system to repair such errors can be a useful aid to the developers for their productivity. In this work, ...
Read More
Distance measures based on the edit distance for permutation-type representations

In this paper, we discuss distance measures for a number of different combinatorial optimization problems of which the solutions are best represented as permutations of items, sometimes composed of several permutation (sub)sets. The problems discussed ...
Read More
Distance measures for PCA-based face recognition

In this article we compare 14 distance measures and their modifications between feature vectors with respect to the recognition performance of the principal component analysis (PCA)-based face recognition method and propose modified sum square error (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1
August 2023
520 pages
ISBN:9781450399760
DOI:10.1145/3568813
Editors:
Kathi Fisler,
Paul Denny,
Diana Franklin,
Margaret Hamilton
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 September 2023
Check for updates
Author Tags
BLEU
ROUGE
automated program repair
automatic program repair
bug fixing
computing education
dataset
distance measures
distance metrics
educational data mining
feedback
natural language processing
program repair
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate189of803submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 264
  Total Downloads
- Downloads (Last 12 months)264
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Evaluating Distance Measures for Program Repair

ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1

ABSTRACT

References

Cited By

Index Terms

Recommendations

RepairNet: Contextual Sequence-to-Sequence Network for Automated Program Repair

Distance measures based on the edit distance for permutation-type representations

Distance measures for PCA-based face recognition