ABSTRACT
In an introductory programming (CS1) context, a Refute question asks students for a counter-example which proves that a given code fragment is an incorrect solution for a given task. Such a question can be used as an assessment item to (formatively) develop or (summatively) demonstrate a student's abilities to comprehend the task and the code well enough to recognize a mismatch. These abilities assume greater significance with the emergence of generative AI technologies capable of writing code that is plausible (at least to novice programmers) but not always correct.
Instructors must address three concerns while designing an effective Refute question, each influenced by their specific teaching-learning context: (1) Is the task comprehensible? (2) Is the incorrect code a plausible solution for the task? (3) Is the complexity of finding a counter-example acceptable? While the first concern can often be addressed by reusing tasks from previous code writing questions, addressing the latter concerns may require substantial instructor effort. We therefore investigate whether concerns (2) and (3) can be addressed by buggy student solutions for the corresponding code writing question from a previous course offering. For 6 code writing questions (from a Fall 2015 C programming course), our automated evaluation system logged 13,847 snapshots of executable student code, of which 10,574 were buggy (i.e., they failed at least one instructor-supplied test case). Code selected randomly from this pool rarely addresses these concerns, and manual selection is infeasible. Our paper makes three contributions. First, we propose an automated mechanism to filter this pool to a more manageable number of snapshots from which appropriate code can be selected manually. Second, we evaluate our semi-automated mechanism with respect to concerns (2) and (3) by surveying a diverse set of 56 experienced participants (instructors, tutors, and teaching assistants). Third, we use this mechanism to seed a public repository of Refute questions and provide a template to create additional questions using a public resource (CodeCheck).
- G.S. Adithi, Akshay Adiga, K. Pavithra, Prajwal P. Vasisht, and Viraj Kumar. 2015. Secure, Offline Feedback to Convey Instructor Intent. In 2015 IEEE Seventh International Conference on Technology for Education (T4E). 105--108.Google ScholarDigital Library
- Marzieh Ahmadzadeh, Dave Elliman, and Colin Higgins. 2005. An analysis of patterns of debugging among novice computer science students. In Proceedings of the 10th annual SIGCSE conference on Innovation and technology in computer science education. 84--88.Google ScholarDigital Library
- Ibrahim Albluwi. 2019. Plagiarism in Programming Assessments: A Systematic Review. ACM Trans. Comput. Educ. 20, 1, Article 6 (dec 2019), 28 pages. https://doi.org/10.1145/3371156Google ScholarDigital Library
- Nabeel Alzahrani and Frank Vahid. 2021. Common Logic Errors for Programming Learners: A Three-decade Literature Survey. In 2021 ASEE Virtual Annual Conference Content Access.Google Scholar
- Desai Ankur and Deo Atul. 2022. Introducing Amazon CodeWhisperer, the ML-powered Coding Companion. Retrieved Jan 21, 2023 from https://aws.amazon.com/blogs/machine-learning/introducing-amazon-codewhisperer-the-ml-powered-coding-companion/Google Scholar
- Ruven Brooks. 1983. Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies 18, 6 (1983), 543--554. https://doi.org/10.1016/S0020--7373(83)80031--5Google ScholarCross Ref
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.Google Scholar
- Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, et al. 2021. Evaluating Large Language Models Trained on Code. https://doi.org/10.48550/ARXIV.2107.03374Google ScholarCross Ref
- Rajdeep Das, Umair Z Ahmed, Amey Karkare, and Sumit Gulwani. 2016. Prutor: A system for tutoring CS1 and collecting student programs for analysis. arXiv preprint arXiv:1608.03828 (2016).Google Scholar
- Michael de Raadt, Richard Watson, and Mark Toleman. 2009. Teaching and Assessing Programming Strategies Explicitly. In Proceedings of the Eleventh Australasian Conference on Computing Education - Volume 95 (Wellington, New Zealand) (ACE '09). 45--54.Google ScholarDigital Library
- M Decasse and A-M Emde. 1988. A review of automated debugging systems: Knowledge, strategies and techniques. In Proceedings. [1989] 11th International Conference on Software Engineering. IEEE Computer Society, 162--163.Google ScholarCross Ref
- Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). 1136--1142. https://doi.org/10.1145/3545945.3569823Google ScholarDigital Library
- Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common logic errors made by novice programmers. In Proceedings of the 20th Australasian Computing Education Conference. 83--89.Google ScholarDigital Library
- James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference. 10--19.Google Scholar
- John D Gould. 1975. Some psychological evidence on how people debug computer programs. International Journal of Man-Machine Studies 7, 2 (1975), 151--182.Google ScholarCross Ref
- Sumit Gulwani, Ivan Radivcek, and Florian Zuleger. 2018. Automated Clustering and Program Repair for Introductory Programming Assignments. SIGPLAN Not. 53, 4 (jun 2018), 465--480. https://doi.org/10.1145/3296979.3192387Google ScholarDigital Library
- Cay S. Horstmann. [n.,d.]. CodeCheck. Retrieved May 12, 2023 from https://horstmann.com/codecheck/index.htmlGoogle Scholar
- Theresia Devi Indriasari, Andrew Luxton-Reilly, and Paul Denny. 2020. A review of peer code review in higher education. ACM Transactions on Computing Education (TOCE) 20, 3 (2020), 1--25.Google ScholarDigital Library
- Cruz Izu, Carsten Schulte, Ashish Aggarwal, Quintin Cutts, Rodrigo Duran, Mirela Gutica, Birte Heinemann, Eileen Kraemer, Violetta Lonati, Claudio Mirolo, et al. 2019. Fostering program comprehension in novice programmers-learning activities and learning trajectories. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. 27--52.Google ScholarDigital Library
- Matthew C Jadud. 2005. A first look at novice compilation behaviour using BlueJ. Computer Science Education 15, 1 (2005), 25--40.Google ScholarCross Ref
- Amey Karkare and Purushottam Kar. 2022. Prutor: an intelligent learning and management system for programming courses. Commun. ACM 65, 11 (2022), 62--64.Google ScholarDigital Library
- Irvin R Katz and John R Anderson. 1987. Debugging: An analysis of bug-location strategies. Human-Computer Interaction 3, 4 (1987), 351--399.Google ScholarDigital Library
- Marja Kuittinen and Jorma Sajaniemi. 2004. Teaching Roles of Variables in Elementary Programming Courses. SIGCSE Bull. 36, 3 (jun 2004), 57--61.Google ScholarDigital Library
- Viraj Kumar. 2021. Refute: An Alternative to 'Explain in Plain English' Questions. In Proceedings of the 17th ACM Conference on International Computing Education Research (Virtual Event, USA) (ICER 2021). 438--440.Google ScholarDigital Library
- Viraj Kumar and Arun Raman. 2023. Helping Students Develop a Critical Eye with Refute Questions. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 2. 1181. https://doi.org/10.1145/3545947.3569636Google ScholarDigital Library
- Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. 2022. Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814 (2022).Google Scholar
- Dastyni Loksa and Amy J. Ko. 2016. The Role of Self-Regulation in Programming Problem Solving Process and Success. In Proceedings of the 2016 ACM Conference on International Computing Education Research (Melbourne, VIC, Australia) (ICER '16). 83--91. https://doi.org/10.1145/2960310.2960334Google ScholarDigital Library
- Manuel Maarek and Léon McGregor. 2020. Development of a Web Platform for Code Peer-Testing. arXiv preprint arXiv:2008.06102 (2020).Google Scholar
- Rifat Sabbir Mansur, Ayaan M Kazerouni, Stephen H Edwards, and Clifford A Shaffer. 2020. Exploring the Bug Investigation Techniques of Intermediate Student Programmers. In Koli Calling'20: Proceedings of the 20th Koli Calling International Conference on Computing Education Research. 1--10.Google Scholar
- Renee McCauley, Sue Fitzgerald, Gary Lewandowski, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: a review of the literature from an educational perspective. Computer Science Education 18, 2 (2008), 67--92.Google ScholarCross Ref
- Léon McGregor and Manuel Maarek. 2020. Software Testing as Medium for Peer Feedback. In United Kingdom & Ireland Computing Education Research conference. 66--72.Google ScholarDigital Library
- Robert C Metzger. 2004. Debugging by thinking: A multidisciplinary approach. Digital Press.Google Scholar
- OpenAI. 2023. GPT-4 Technical Report. arxiv: 2303.08774 [cs.CL]Google Scholar
- José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated assessment in computer science education: A state-of-the-art review. ACM Transactions on Computing Education (TOCE) 22, 3 (2022), 1--40.Google ScholarDigital Library
- Joe Gibbs Politz, Joseph M Collard, Arjun Guha, Kathi Fisler, and Shriram Krishnamurthi. 2016. The sweep: Essential examples for in-flow peer review. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 243--248.Google ScholarDigital Library
- James Prather, Raymond Pettit, Brett A. Becker, Paul Denny, Dastyni Loksa, Alani Peters, Zachary Albrecht, and Krista Masci. 2019. BEST PAPER AT SIGCSE 2019 IN THE CS EDUCATION TRACK: First Things First: Providing Metacognitive Scaffolding for Interpreting Problem Prompts. ACM Inroads 10, 2 (apr 2019).Google ScholarDigital Library
- James Prather, Raymond Pettit, Kayla McMurry, Alani Peters, John Homer, and Maxine Cohen. 2018. Metacognitive Difficulties Faced by Novice Programmers in Automated Assessment Tools. In Proceedings of the 2018 ACM Conference on International Computing Education Research (ICER '18). 41--50.Google ScholarDigital Library
- Peter C Rigby and Christian Bird. 2013. Convergent contemporary software peer review practices. In Proceedings of the 2013 9th joint meeting on foundations of software engineering. 202--212.Google ScholarDigital Library
- Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27--43.Google ScholarDigital Library
- Carsten Schulte. 2008. Block Model: An Educational Model of Program Comprehension as a Tool for a Scholarly Approach to Teaching. In Proceedings of the Fourth International Workshop on Computing Education Research (Sydney, Australia) (ICER '08). 149--160. https://doi.org/10.1145/1404520.1404535Google ScholarDigital Library
- Joanna Smith, Joe Tessler, Elliot Kramer, and Calvin Lin. 2012. Using peer review to teach software testing. In Proceedings of the ninth annual international conference on international computing education research. 93--98.Google ScholarDigital Library
- Rebecca Smith and Scott Rixner. 2019. The error landscape: Characterizing the mistakes of novice programmers. In Proceedings of the 50th ACM technical symposium on computer science education. 538--544.Google ScholarDigital Library
- Peter Sovietov. 2021. Automatic generation of programming exercises. In 2021 1st International Conference on Technology Enhanced Learning in Higher Education (TELE). IEEE, 111--114.Google ScholarCross Ref
- James G Spohrer and Elliot Soloway. 1986. Analyzing the high frequency bugs in novice programs. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers. 230--251.Google Scholar
- Thomas A. Standish. 1984. An Essay on Software Reuse. IEEE Transactions on Software Engineering SE-10, 5 (1984), 494--497. https://doi.org/10.1109/TSE.1984.5010272Google ScholarDigital Library
- Andrew Thangaraj. 2022. Learning from IITM's Data Science Program. In Proceedings of the 15th Annual ACM India Compute Conference. 2--2.Google ScholarDigital Library
- Thomas James Tiam-Lee and Kaoru Sumi. 2018. Procedural generation of programming exercises with guides based on the student's emotion. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE.Google ScholarDigital Library
- Rebecca Tiarks. 2011. What Programmers Really Do - An Observational Study. Softwaretechnik-Trends 31 (2011).Google Scholar
- Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2023). 172--178.Google ScholarDigital Library
- Jacqueline Whalley, Amber Settle, and Andrew Luxton-Reilly. 2023. A Think-aloud Study of Novice Debugging. ACM Transactions on Computing Education (2023).Google Scholar
- Susan Wiedenbeck and Nancy J. Evans. 1986. BEACONS IN PROGRAM COMPREHENSION. SIGCHI Bull. 18, 2 (oct 1986), 56--57. https://doi.org/10.1145/15683.1044090Google ScholarDigital Library
- John Wrenn and Shriram Krishnamurthi. 2019. Executable Examples for Programming Problem Comprehension. In Proceedings of the 2019 ACM Conference on International Computing Education Research (Toronto ON, Canada) (ICER '19). Association for Computing Machinery, New York, NY, USA, 131--139. https://doi.org/10.1145/3291279.3339416Google ScholarDigital Library
- Andreas Zeller. 2009. Why programs fail: a guide to systematic debugging. Elsevier.Google ScholarDigital Library
- Rui Zhi, Thomas W Price, Nicholas Lytle, Yihuan Dong, and Tiffany Barnes. 2018. Reducing the state space of programming problems through data-driven feature detection. In Educational Data Mining in Computer Science Education (CSEDM) Workshop@ EDM, Vol. 18.Google Scholar
Index Terms
- A Bug's New Life: Creating Refute Questions from Filtered CS1 Student Code Snapshots
Recommendations
Including Coding Questions in Video Quizzes for a Flipped CS1
SIGCSE '18: Proceedings of the 49th ACM Technical Symposium on Computer Science EducationIn an effort to improve student performance in a flipped classroom environment, this paper explores the impact of including auto-graded coding questions in gate check quizzes associated with videos for a flipped CS1 course. Previous work showed that ...
Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on InternetwareBug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
Pair programming in CS1: overcoming objections to its adoption
In academic year 2005-06, the Bren School considered incorporating pair programming into CS1, primarily because of reports it increased students' satisfaction with the course and improved their performance in it. Though not denying its benefits, ...
Comments