skip to main content
10.1145/3576882.3617916acmconferencesArticle/Chapter ViewAbstractPublication PagescompedConference Proceedingsconference-collections
research-article
Best Paper

A Bug's New Life: Creating Refute Questions from Filtered CS1 Student Code Snapshots

Published:05 December 2023Publication History

ABSTRACT

In an introductory programming (CS1) context, a Refute question asks students for a counter-example which proves that a given code fragment is an incorrect solution for a given task. Such a question can be used as an assessment item to (formatively) develop or (summatively) demonstrate a student's abilities to comprehend the task and the code well enough to recognize a mismatch. These abilities assume greater significance with the emergence of generative AI technologies capable of writing code that is plausible (at least to novice programmers) but not always correct.

Instructors must address three concerns while designing an effective Refute question, each influenced by their specific teaching-learning context: (1) Is the task comprehensible? (2) Is the incorrect code a plausible solution for the task? (3) Is the complexity of finding a counter-example acceptable? While the first concern can often be addressed by reusing tasks from previous code writing questions, addressing the latter concerns may require substantial instructor effort. We therefore investigate whether concerns (2) and (3) can be addressed by buggy student solutions for the corresponding code writing question from a previous course offering. For 6 code writing questions (from a Fall 2015 C programming course), our automated evaluation system logged 13,847 snapshots of executable student code, of which 10,574 were buggy (i.e., they failed at least one instructor-supplied test case). Code selected randomly from this pool rarely addresses these concerns, and manual selection is infeasible. Our paper makes three contributions. First, we propose an automated mechanism to filter this pool to a more manageable number of snapshots from which appropriate code can be selected manually. Second, we evaluate our semi-automated mechanism with respect to concerns (2) and (3) by surveying a diverse set of 56 experienced participants (instructors, tutors, and teaching assistants). Third, we use this mechanism to seed a public repository of Refute questions and provide a template to create additional questions using a public resource (CodeCheck).

References

  1. G.S. Adithi, Akshay Adiga, K. Pavithra, Prajwal P. Vasisht, and Viraj Kumar. 2015. Secure, Offline Feedback to Convey Instructor Intent. In 2015 IEEE Seventh International Conference on Technology for Education (T4E). 105--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marzieh Ahmadzadeh, Dave Elliman, and Colin Higgins. 2005. An analysis of patterns of debugging among novice computer science students. In Proceedings of the 10th annual SIGCSE conference on Innovation and technology in computer science education. 84--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ibrahim Albluwi. 2019. Plagiarism in Programming Assessments: A Systematic Review. ACM Trans. Comput. Educ. 20, 1, Article 6 (dec 2019), 28 pages. https://doi.org/10.1145/3371156Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nabeel Alzahrani and Frank Vahid. 2021. Common Logic Errors for Programming Learners: A Three-decade Literature Survey. In 2021 ASEE Virtual Annual Conference Content Access.Google ScholarGoogle Scholar
  5. Desai Ankur and Deo Atul. 2022. Introducing Amazon CodeWhisperer, the ML-powered Coding Companion. Retrieved Jan 21, 2023 from https://aws.amazon.com/blogs/machine-learning/introducing-amazon-codewhisperer-the-ml-powered-coding-companion/Google ScholarGoogle Scholar
  6. Ruven Brooks. 1983. Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies 18, 6 (1983), 543--554. https://doi.org/10.1016/S0020--7373(83)80031--5Google ScholarGoogle ScholarCross RefCross Ref
  7. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.Google ScholarGoogle Scholar
  8. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, et al. 2021. Evaluating Large Language Models Trained on Code. https://doi.org/10.48550/ARXIV.2107.03374Google ScholarGoogle ScholarCross RefCross Ref
  9. Rajdeep Das, Umair Z Ahmed, Amey Karkare, and Sumit Gulwani. 2016. Prutor: A system for tutoring CS1 and collecting student programs for analysis. arXiv preprint arXiv:1608.03828 (2016).Google ScholarGoogle Scholar
  10. Michael de Raadt, Richard Watson, and Mark Toleman. 2009. Teaching and Assessing Programming Strategies Explicitly. In Proceedings of the Eleventh Australasian Conference on Computing Education - Volume 95 (Wellington, New Zealand) (ACE '09). 45--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M Decasse and A-M Emde. 1988. A review of automated debugging systems: Knowledge, strategies and techniques. In Proceedings. [1989] 11th International Conference on Software Engineering. IEEE Computer Society, 162--163.Google ScholarGoogle ScholarCross RefCross Ref
  12. Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). 1136--1142. https://doi.org/10.1145/3545945.3569823Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common logic errors made by novice programmers. In Proceedings of the 20th Australasian Computing Education Conference. 83--89.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference. 10--19.Google ScholarGoogle Scholar
  15. John D Gould. 1975. Some psychological evidence on how people debug computer programs. International Journal of Man-Machine Studies 7, 2 (1975), 151--182.Google ScholarGoogle ScholarCross RefCross Ref
  16. Sumit Gulwani, Ivan Radivcek, and Florian Zuleger. 2018. Automated Clustering and Program Repair for Introductory Programming Assignments. SIGPLAN Not. 53, 4 (jun 2018), 465--480. https://doi.org/10.1145/3296979.3192387Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cay S. Horstmann. [n.,d.]. CodeCheck. Retrieved May 12, 2023 from https://horstmann.com/codecheck/index.htmlGoogle ScholarGoogle Scholar
  18. Theresia Devi Indriasari, Andrew Luxton-Reilly, and Paul Denny. 2020. A review of peer code review in higher education. ACM Transactions on Computing Education (TOCE) 20, 3 (2020), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Cruz Izu, Carsten Schulte, Ashish Aggarwal, Quintin Cutts, Rodrigo Duran, Mirela Gutica, Birte Heinemann, Eileen Kraemer, Violetta Lonati, Claudio Mirolo, et al. 2019. Fostering program comprehension in novice programmers-learning activities and learning trajectories. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. 27--52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Matthew C Jadud. 2005. A first look at novice compilation behaviour using BlueJ. Computer Science Education 15, 1 (2005), 25--40.Google ScholarGoogle ScholarCross RefCross Ref
  21. Amey Karkare and Purushottam Kar. 2022. Prutor: an intelligent learning and management system for programming courses. Commun. ACM 65, 11 (2022), 62--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Irvin R Katz and John R Anderson. 1987. Debugging: An analysis of bug-location strategies. Human-Computer Interaction 3, 4 (1987), 351--399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Marja Kuittinen and Jorma Sajaniemi. 2004. Teaching Roles of Variables in Elementary Programming Courses. SIGCSE Bull. 36, 3 (jun 2004), 57--61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Viraj Kumar. 2021. Refute: An Alternative to 'Explain in Plain English' Questions. In Proceedings of the 17th ACM Conference on International Computing Education Research (Virtual Event, USA) (ICER 2021). 438--440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Viraj Kumar and Arun Raman. 2023. Helping Students Develop a Critical Eye with Refute Questions. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 2. 1181. https://doi.org/10.1145/3545947.3569636Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. 2022. Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814 (2022).Google ScholarGoogle Scholar
  27. Dastyni Loksa and Amy J. Ko. 2016. The Role of Self-Regulation in Programming Problem Solving Process and Success. In Proceedings of the 2016 ACM Conference on International Computing Education Research (Melbourne, VIC, Australia) (ICER '16). 83--91. https://doi.org/10.1145/2960310.2960334Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Manuel Maarek and Léon McGregor. 2020. Development of a Web Platform for Code Peer-Testing. arXiv preprint arXiv:2008.06102 (2020).Google ScholarGoogle Scholar
  29. Rifat Sabbir Mansur, Ayaan M Kazerouni, Stephen H Edwards, and Clifford A Shaffer. 2020. Exploring the Bug Investigation Techniques of Intermediate Student Programmers. In Koli Calling'20: Proceedings of the 20th Koli Calling International Conference on Computing Education Research. 1--10.Google ScholarGoogle Scholar
  30. Renee McCauley, Sue Fitzgerald, Gary Lewandowski, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: a review of the literature from an educational perspective. Computer Science Education 18, 2 (2008), 67--92.Google ScholarGoogle ScholarCross RefCross Ref
  31. Léon McGregor and Manuel Maarek. 2020. Software Testing as Medium for Peer Feedback. In United Kingdom & Ireland Computing Education Research conference. 66--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Robert C Metzger. 2004. Debugging by thinking: A multidisciplinary approach. Digital Press.Google ScholarGoogle Scholar
  33. OpenAI. 2023. GPT-4 Technical Report. arxiv: 2303.08774 [cs.CL]Google ScholarGoogle Scholar
  34. José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated assessment in computer science education: A state-of-the-art review. ACM Transactions on Computing Education (TOCE) 22, 3 (2022), 1--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Joe Gibbs Politz, Joseph M Collard, Arjun Guha, Kathi Fisler, and Shriram Krishnamurthi. 2016. The sweep: Essential examples for in-flow peer review. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 243--248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. James Prather, Raymond Pettit, Brett A. Becker, Paul Denny, Dastyni Loksa, Alani Peters, Zachary Albrecht, and Krista Masci. 2019. BEST PAPER AT SIGCSE 2019 IN THE CS EDUCATION TRACK: First Things First: Providing Metacognitive Scaffolding for Interpreting Problem Prompts. ACM Inroads 10, 2 (apr 2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. James Prather, Raymond Pettit, Kayla McMurry, Alani Peters, John Homer, and Maxine Cohen. 2018. Metacognitive Difficulties Faced by Novice Programmers in Automated Assessment Tools. In Proceedings of the 2018 ACM Conference on International Computing Education Research (ICER '18). 41--50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Peter C Rigby and Christian Bird. 2013. Convergent contemporary software peer review practices. In Proceedings of the 2013 9th joint meeting on foundations of software engineering. 202--212.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Carsten Schulte. 2008. Block Model: An Educational Model of Program Comprehension as a Tool for a Scholarly Approach to Teaching. In Proceedings of the Fourth International Workshop on Computing Education Research (Sydney, Australia) (ICER '08). 149--160. https://doi.org/10.1145/1404520.1404535Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Joanna Smith, Joe Tessler, Elliot Kramer, and Calvin Lin. 2012. Using peer review to teach software testing. In Proceedings of the ninth annual international conference on international computing education research. 93--98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Rebecca Smith and Scott Rixner. 2019. The error landscape: Characterizing the mistakes of novice programmers. In Proceedings of the 50th ACM technical symposium on computer science education. 538--544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Peter Sovietov. 2021. Automatic generation of programming exercises. In 2021 1st International Conference on Technology Enhanced Learning in Higher Education (TELE). IEEE, 111--114.Google ScholarGoogle ScholarCross RefCross Ref
  44. James G Spohrer and Elliot Soloway. 1986. Analyzing the high frequency bugs in novice programs. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers. 230--251.Google ScholarGoogle Scholar
  45. Thomas A. Standish. 1984. An Essay on Software Reuse. IEEE Transactions on Software Engineering SE-10, 5 (1984), 494--497. https://doi.org/10.1109/TSE.1984.5010272Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Andrew Thangaraj. 2022. Learning from IITM's Data Science Program. In Proceedings of the 15th Annual ACM India Compute Conference. 2--2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Thomas James Tiam-Lee and Kaoru Sumi. 2018. Procedural generation of programming exercises with guides based on the student's emotion. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Rebecca Tiarks. 2011. What Programmers Really Do - An Observational Study. Softwaretechnik-Trends 31 (2011).Google ScholarGoogle Scholar
  49. Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2023). 172--178.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jacqueline Whalley, Amber Settle, and Andrew Luxton-Reilly. 2023. A Think-aloud Study of Novice Debugging. ACM Transactions on Computing Education (2023).Google ScholarGoogle Scholar
  51. Susan Wiedenbeck and Nancy J. Evans. 1986. BEACONS IN PROGRAM COMPREHENSION. SIGCHI Bull. 18, 2 (oct 1986), 56--57. https://doi.org/10.1145/15683.1044090Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. John Wrenn and Shriram Krishnamurthi. 2019. Executable Examples for Programming Problem Comprehension. In Proceedings of the 2019 ACM Conference on International Computing Education Research (Toronto ON, Canada) (ICER '19). Association for Computing Machinery, New York, NY, USA, 131--139. https://doi.org/10.1145/3291279.3339416Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Andreas Zeller. 2009. Why programs fail: a guide to systematic debugging. Elsevier.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Rui Zhi, Thomas W Price, Nicholas Lytle, Yihuan Dong, and Tiffany Barnes. 2018. Reducing the state space of programming problems through data-driven feature detection. In Educational Data Mining in Computer Science Education (CSEDM) Workshop@ EDM, Vol. 18.Google ScholarGoogle Scholar

Index Terms

  1. A Bug's New Life: Creating Refute Questions from Filtered CS1 Student Code Snapshots

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CompEd 2023: Proceedings of the ACM Conference on Global Computing Education Vol 1
        December 2023
        180 pages
        ISBN:9798400700484
        DOI:10.1145/3576882

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 December 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate33of100submissions,33%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader