skip to main content
10.1145/3568813.3600130acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicerConference Proceedingsconference-collections
research-article
Open Access

Evaluating Distance Measures for Program Repair

Published:10 September 2023Publication History

ABSTRACT

Background and Context: Struggling with programming assignments while learning to program is a common phenomenon in programming courses around the world. Supporting struggling students is a common theme in Computing Education Research (CER), where a wide variety of support methods have been created and evaluated. An important stream of research here focuses on program repair, where methods for automatically fixing erroneous code are used for supporting students as they debug their code. Work in this area has so far assessed the performance of the methods by evaluating the closeness of the proposed fixes to the original erroneous code. The evaluations have mainly relied on the use of edit distance measures such as the sequence edit distance and there is a lack of research on which distance measure is the most appropriate.

Objectives: Provide insight into measures for quantifying the distance between erroneous code written by a student and a proposed change. We conduct the evaluation in an introductory programming context, where insight into the distance measures can provide help in choosing a suitable metric that can inform which fixes should be suggested to novices.

Method: A team of five experts annotated a subset of the Dublin dataset, creating solutions for over a thousand erroneous programs written by students. We evaluated how the prominent edit distance measures from the CER literature compare against measures used in Natural Language Processing (NLP) tasks for retrieving the experts’ solutions from a pool of proposed solutions. We also evaluated how the expert-generated solutions compare against the solutions proposed by common program repair algorithms. The annotated dataset and the evaluation code are published as part of the work.

Findings: Our results highlight that the ROUGE score, classically used for evaluating the performance of machine summarization tasks, performs well as an evaluation and selection metric for program repair. We also highlight the practical utility of NLP metrics, which allow an easier interpretation and comparison of the performance of repair techniques when compared to the classic methods used in the CER literature.

Implications: Our study highlights the variety of distance metrics used for comparing source codes. We find issues with the classically used distance measures that can be combated by using NLP metrics. Based on our findings, we recommend including NLP metrics, and in particular, the ROUGE metric, in evaluations when considering new program repair methodologies. We also suggest incorporating NLP metrics into other areas where source codes are compared, including plagiarism detection.

References

  1. Umair Z Ahmed, Pawan Kumar, Amey Karkare, Purushottam Kar, and Sumit Gulwani. 2018. Compilation error repair: for the student programs, from the student programs. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training. 78–87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kirsti M Ala-Mutka. 2005. A survey of automated assessment approaches for programming assignments. Computer science education 15, 2 (2005), 83–102.Google ScholarGoogle Scholar
  3. Amjad Altadmri and Neil CC Brown. 2015. 37 million compilations: Investigating novice programming mistakes in large-scale student data. In Proceedings of the 46th ACM technical symposium on computer science education. 522–527.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David Azcona, Piyush Arora, I-Han Hsiao, and Alan Smeaton. 2019. user2code2vec: Embeddings for Profiling Students Based on Distributional Representations of Source Code. In Proceedings of the 9th International Learning Analytics & Knowledge Conference (LAK’19). ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brett A Becker. 2016. A new metric to quantify repeated compiler errors for novice programmers. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. 296–301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brett A Becker, Paul Denny, Raymond Pettit, Durell Bouchard, Dennis J Bouvier, Brian Harrington, Amir Kamil, Amey Karkare, Chris McDonald, Peter-Michael Osera, 2019. Compiler error messages considered unhelpful: The landscape of text-based programming error message research. Proceedings of the working group reports on innovation and technology in computer science education (2019), 177–210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sahil Bhatia and Rishabh Singh. 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. ArXiv (2016).Google ScholarGoogle Scholar
  8. Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Neil CC Brown and Amjad Altadmri. 2017. Novice Java programming mistakes: Large-scale data vs. educator beliefs. ACM Transactions on Computing Education (TOCE) 17, 2 (2017), 1–21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Charis Charitsis, Chris Piech, and John C Mitchell. 2023. Detecting the Reasons for Program Decomposition in CS1 and Evaluating Their Impact. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 1014–1020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guillaume Cleuziou and Frédéric Flouvat. 2021. Learning student program embeddings using abstract execution traces. In Proceedings of the 14th Educational Data Mining conference. https://educationaldatamining.org/EDM2021/virtual/poster_paper70.htmlGoogle ScholarGoogle Scholar
  12. Paul Denny, Andrew Luxton-Reilly, and Ewan Tempero. 2012. All syntax errors are not equal. In Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education. 75–80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Paul Denny, James Prather, Brett A Becker, Catherine Mooney, John Homer, Zachary C Albrecht, and Garrett B Powell. 2021. On Designing Programming Error Messages for Novices: Readability and Its Constituent Factors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Paul Denny, Jacqueline Whalley, and Juho Leinonen. 2021. Promoting early engagement with programming assignments using scheduled automated feedback. In Proceedings of the 23rd Australasian Computing Education Conference. 88–95.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Christopher Douce, David Livingstone, and James Orwell. 2005. Automatic test-based assessment of programming: A review. Journal on Educational Resources in Computing (JERIC) 5, 3 (2005), 4–es.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Benedict Du Boulay. 1986. Some difficulties of learning to program. Journal of Educational Computing Research 2, 1 (1986), 57–73.Google ScholarGoogle ScholarCross RefCross Ref
  17. Thomas Dy and Ma Mercedes Rodrigo. 2010. A detector for non-literal Java errors. In Proceedings of the 10th Koli Calling International Conference on Computing Education Research. 118–122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common logic errors made by novice programmers. In Proceedings of the 20th Australasian Computing Education Conference. 83–89.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Elena L Glassman, Jeremy Scott, Rishabh Singh, Philip J Guo, and Robert C Miller. 2015. OverCode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI) 22, 2 (2015), 1–35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Google-Research. [n. d.]. Google-Research/Rouge at master · google-research/google-research. https://github.com/google-research/google-research/tree/master/rougeGoogle ScholarGoogle Scholar
  21. Sumit Gulwani, Ivan Radiček, and Florian Zuleger. 2018. Automated Clustering and Program Repair for Introductory Programming Assignments. http://arxiv.org/abs/1603.03165 arXiv:1603.03165 [cs].Google ScholarGoogle Scholar
  22. Rahul Gupta, Aditya Kanade, and Shirish Shevade. 2019. Deep Reinforcement Learning for Syntactic Error Repair in Student Programs. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (July 2019), 930–937. https://doi.org/10.1609/aaai.v33i01.3301930 Number: 01.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rahul Gupta, Aditya Kanade, and Shirish Shevade. 2019. Neural Attribution for Semantic Bug-Localization in Student Programs. Curran Associates Inc., Red Hook, NY, USA.Google ScholarGoogle Scholar
  24. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence 31, 1 (Feb. 2017). https://ojs.aaai.org/index.php/AAAI/article/view/10742 Number: 1.Google ScholarGoogle ScholarCross RefCross Ref
  25. John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77, 1 (2007), 81–112.Google ScholarGoogle Scholar
  26. Andrew Head, Elena Glassman, Gustavo Soares, Ryo Suzuki, Lucas Figueredo, Loris D’Antoni, and Björn Hartmann. 2017. Writing reusable code feedback at scale with mixed-initiative program synthesis. In Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale. 89–98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kenny Heinonen, Kasper Hirvikoski, Matti Luukkainen, and Arto Vihavainen. 2014. Using codebrowser to seek differences between novice programmers. In Proceedings of the 45th ACM technical symposium on Computer science education. 229–234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Arto Hellas, Petri Ihantola, Andrew Petersen, Vangel V. Ajanovski, Mirela Gutica, Timo Hynninen, Antti Knutas, Juho Leinonen, Chris Messom, and Soohyun Nam Liao. 2018. Predicting Academic Performance: A Systematic Literature Review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education (Larnaca, Cyprus) (ITiCSE 2018 Companion). Association for Computing Machinery, New York, NY, USA, 175–199. https://doi.org/10.1145/3293881.3295783Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yang Hu, Umair Z. Ahmed, Sergey Mechtaev, Ben Leong, and Abhik Roychoudhury. 2019. Re-factoring based Program Repair applied to Programming Assignments. In 2019 34th IEEE/ACM Int. Conf. on Automated Software Engineering (ASE). IEEE/ACM, 388–398.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jonathan Huang, Chris Piech, Andy Nguyen, and Leonidas Guibas. 2013. Syntactic and functional variability of a million code submissions in a machine learning MOOC. AIED 2013 Workshops Proceedings Volume 1009 (Jan. 2013), 25.Google ScholarGoogle Scholar
  31. Petri Ihantola, Tuukka Ahoniemi, Ville Karavirta, and Otto Seppälä. 2010. Review of recent systems for automatic assessment of programming assignments. In Proceedings of the 10th Koli calling international conference on computing education research. 86–93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Petri Ihantola, Arto Vihavainen, Alireza Ahadi, Matthew Butler, Jürgen Börstler, Stephen H Edwards, Essi Isohanni, Ari Korhonen, Andrew Petersen, Kelly Rivers, 2015. Educational data mining and learning analytics in programming: Literature review and case studies. Proceedings of the 2015 ITiCSE on Working Group Reports (2015), 41–63.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Matthew C Jadud. 2005. A first look at novice compilation behaviour using BlueJ. Computer Science Education 15, 1 (2005), 25–40.Google ScholarGoogle ScholarCross RefCross Ref
  34. Matthew C Jadud. 2006. Methods and tools for exploring novice compilation behaviour. In Proceedings of the second international workshop on Computing education research. 73–84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Lewis Johnson, Elliot Soloway, Benjamin Cutler, and Steven Draper. 1983. Bug Catalogue: I. Technical Report. Yale University, YaleU/CSD/RR #286.Google ScholarGoogle Scholar
  36. Shalini Kaleeswaran, Anirudh Santhiar, Aditya Kanade, and Sumit Gulwani. 2016. Semi-Supervised Verified Feedback Generation. arxiv:1603.04584 [cs.SE]Google ScholarGoogle Scholar
  37. Hieke Keuning, Bastiaan Heeren, and Johan Jeuring. 2014. Strategy-based feedback in a programming tutor. In Proceedings of the computer science education research conference. 43–54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2018. A systematic literature review of automated feedback generation for programming exercises. ACM Transactions on Computing Education (TOCE) 19, 1 (2018), 1–43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tobias Kohn. 2019. The error behind the message: Finding the cause of error messages in python. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 524–530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Teemu Koivisto and Arto Hellas. 2022. Evaluating CodeClusters for Effectively Providing Feedback on Code Submissions. In 2022 IEEE Frontiers in Education Conference (FIE). IEEE, 1–9.Google ScholarGoogle Scholar
  41. Michael Kölling, Bruce Quig, Andrew Patterson, and John Rosenberg. 2003. The BlueJ system and its pedagogy. Computer Science Education 13, 4 (2003), 249–268.Google ScholarGoogle ScholarCross RefCross Ref
  42. Essi Lahtinen, Kirsti Ala-Mutka, and Hannu-Matti Järvinen. 2005. A study of the difficulties of novice programmers. Acm sigcse bulletin 37, 3 (2005), 14–18.Google ScholarGoogle Scholar
  43. Juho Leinonen, Paul Denny, and Jacqueline Whalley. 2022. A comparison of immediate and scheduled feedback in introductory programming projects. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 1. 885–891.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, and Brett A Becker. 2023. Using large language models to enhance programming error messages. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 563–569.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://aclanthology.org/W04-1013Google ScholarGoogle Scholar
  46. Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arxiv:2102.04664 [cs.SE]Google ScholarGoogle Scholar
  47. Andrew Luxton-Reilly, Ibrahim Albluwi, Brett A Becker, Michail Giannakos, Amruth N Kumar, Linda Ott, James Paterson, Michael James Scott, Judy Sheard, and Claudia Szabo. 2018. Introductory programming: a systematic literature review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education. 55–106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yana Malysheva and Caitlin Kelleher. 2022. An Algorithm for Generating Explainable Corrections to Student Code. In Proceedings of the 22nd Koli Calling International Conference on Computing Education Research (Koli, Finland) (Koli Calling ’22). Association for Computing Machinery, New York, NY, USA, Article 13, 11 pages. https://doi.org/10.1145/3564721.3564731Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Samiha Marwan, Nicholas Lytle, Joseph Jay Williams, and Thomas Price. 2019. The impact of adding textual explanations to next-step hints in a novice programming environment. In Proceedings of the 2019 ACM conference on innovation and technology in computer science education. 520–526.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jessica McBroom, Irena Koprinska, and Kalina Yacef. 2021. A survey of automated programming hint generation: The hints framework. ACM Computing Surveys (CSUR) 54, 8 (2021), 1–27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Davin McCall and Michael Kölling. 2014. Meaningful categorisation of novice programmer errors. In 2014 IEEE Frontiers in Education Conference (FIE) Proceedings. IEEE, 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  52. Davin McCall and Michael Kölling. 2019. A new look at novice programmer errors. ACM Transactions on Computing Education (TOCE) 19, 4 (2019), 1–30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Andy Nguyen, Christopher Piech, Jonathan Huang, and Leonidas Guibas. 2014. Codewebs: scalable homework search for massive open online programming courses. In Proceedings of the 23rd international conference on World wide web. 491–502.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Henrik Nygren, Juho Leinonen, and Arto Hellas. 2019. Non-restricted Access to Model Solutions: A Good Idea?. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 44–50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Benjamin Paaßen, Bassam Mokbel, and Barbara Hammer. 2015. A Toolbox for Adaptive Sequence Dissimilarity Measures for Intelligent Tutoring Systems. In Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015) (2015-06), Olga Christina Santos, Jesus Gonzalez Boticario, Cristobal Romero, Mykola Pechenizkiy, Agathe Merceron, Piotr Mitros, Jose Maria Luna, Christian Mihaescu, Pablo Moreno, Arnon Hershkovitz, Sebastian Ventura, and Michel Desmarais (Eds.). International Educational Datamining Society, 632–632. http://www.educationaldatamining.org/EDM2015/uploads/papers/paper_257.pdfGoogle ScholarGoogle Scholar
  56. José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated Assessment in Computer Science Education: A State-of-the-Art Review. ACM Transactions on Computing Education (TOCE) (2022).Google ScholarGoogle Scholar
  57. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318. https://doi.org/10.3115/1073083.1073135Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. arXiv preprint arXiv:2302.04662 (2023).Google ScholarGoogle Scholar
  59. Chris Piech, Mehran Sahami, Jonathan Huang, and Leonidas Guibas. 2015. Autonomously generating hints by inferring problem solving policies. In Proceedings of the second (2015) acm conference on learning@ scale. 195–204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Thomas W Price, Yihuan Dong, Rui Zhi, Benjamin Paaßen, Nicholas Lytle, Veronica Cateté, and Tiffany Barnes. 2019. A comparison of the quality of data-driven programming hint generation algorithms. International Journal of Artificial Intelligence in Education 29 (2019), 368–395.Google ScholarGoogle ScholarCross RefCross Ref
  61. Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. sk_p: a neural program corrector for MOOCs. arXiv:1607.02902 [cs] (July 2016). http://arxiv.org/abs/1607.02902 arXiv:1607.02902.Google ScholarGoogle Scholar
  62. Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arxiv:2009.10297 [cs.SE]Google ScholarGoogle Scholar
  63. Kelly Rivers and Kenneth R Koedinger. 2013. Automatic generation of programming feedback: A data-driven approach. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), Vol. 50. 50–59.Google ScholarGoogle Scholar
  64. Kelly Rivers and Kenneth R Koedinger. 2014. Automating hint generation with solution space path construction. In Intelligent Tutoring Systems: 12th International Conference, ITS 2014, Honolulu, HI, USA, June 5-9, 2014. Proceedings 12. Springer, 329–339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Kelly Rivers and Kenneth R Koedinger. 2017. Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. International Journal of Artificial Intelligence in Education 27 (2017), 37–64.Google ScholarGoogle ScholarCross RefCross Ref
  66. Anthony Robins, Patricia Haden, and Sandy Garner. 2006. Problem distributions in a CS1 course. In Proceedings of the 8th Australasian Conference on Computing Education-Volume 52. 165–173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, and José Nelson Amaral. 2018. Syntax and sensibility: Using language models to detect and correct syntax errors. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 311–322.Google ScholarGoogle ScholarCross RefCross Ref
  68. Otto Seppälä, Petri Ihantola, Essi Isohanni, Juha Sorva, and Arto Vihavainen. 2015. Do we know how difficult the rainfall problem is?. In Proceedings of the 15th Koli Calling Conference on Computing Education Research. 87–96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Simon, Oscar Karnalim, Judy Sheard, Ilir Dema, Amey Karkare, Juho Leinonen, Michael Liut, and Renée McCauley. 2020. Choosing code segments to exclude from code similarity detection. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. 1–19.Google ScholarGoogle Scholar
  70. Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. 2013. Automated feedback generation for introductory programming assignments. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI ’13). Association for Computing Machinery, New York, NY, USA, 15–26. https://doi.org/10.1145/2491956.2462195Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Rebecca Smith and Scott Rixner. 2019. The error landscape: Characterizing the mistakes of novice programmers. In Proceedings of the 50th ACM technical symposium on computer science education. 538–544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Elliot Soloway. 1986. Learning to program= learning to construct mechanisms and explanations. Commun. ACM 29, 9 (1986), 850–858.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Elliot Soloway, Jeffrey G. Bonar, and Kate Ehrlich. 1983. Cognitive strategies and looping constructs: An empirical study. Commun. ACM 26, 11 (1983), 853–860. https://doi.org/10.1145/182.358436Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Elliot Soloway, Kate Ehrlich, Jeffrey G. Bonar, and Judith Greenspan. 1982. What do novices know about programming? In Directions in Human–Computer Interactions, Albert Badre and Ben Shneiderman (Eds.). Vol. 6. Ablex Publishing, 27–54.Google ScholarGoogle Scholar
  75. James C Spohrer and Elliot Soloway. 1986. Novice mistakes: Are the folk wisdoms correct?Commun. ACM 29, 7 (1986), 624–632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Arto Vihavainen, Juha Helminen, and Petri Ihantola. 2014. How novices tackle their first lines of code in an ide: Analysis of programming session traces. In Proceedings of the 14th koli calling international conference on computing education research. 109–116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Arto Vihavainen, Thomas Vikberg, Matti Luukkainen, and Martin Pärtel. 2013. Scaffolding students’ learning using test my code. In Proceedings of the 18th ACM conference on Innovation and technology in computer science education. 117–122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Ke Wang, RIshabh Singh, and Zhendong Su. 2017. Data-Driven Feedback Generation for Introductory Programming Exercises. arXiv:1711.07148 [cs] (Nov. 2017). http://arxiv.org/abs/1711.07148 arXiv:1711.07148.Google ScholarGoogle Scholar
  79. Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Dynamic Neural Program Embedding for Program Repair. https://doi.org/10.48550/arXiv.1711.07163 arXiv:1711.07163 [cs].Google ScholarGoogle Scholar
  80. Christopher Watson, Frederick WB Li, and Jamie L Godwin. 2013. Predicting performance in an introductory programming course by logging and analyzing student programming behavior. In 2013 IEEE 13th international conference on advanced learning technologies. IEEE, 319–323.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Jacqueline Whalley, Amber Settle, and Andrew Luxton-Reilly. 2021. Novice reflections on debugging. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 73–79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Jialu Zhang, José Cambronero, Sumit Gulwani, Vu Le, Ruzica Piskac, Gustavo Soares, and Gust Verbruggen. 2022. Repairing Bugs in Python Assignments Using Large Language Models. https://doi.org/10.48550/ARXIV.2209.14876Google ScholarGoogle Scholar

Index Terms

  1. Evaluating Distance Measures for Program Repair

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Article Metrics

          • Downloads (Last 12 months)264
          • Downloads (Last 6 weeks)29

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format