research-article

A Bug's New Life: Creating Refute Questions from Filtered CS1 Student Code Snapshots

Authors:
Nimisha Agarwal

Indian Institute of Technology Kanpur, Kanpur, India

Indian Institute of Technology Kanpur, Kanpur, India

0000-0003-4194-7738
View Profile

,
Viraj Kumar

Indian Institute of Science, Bengaluru, India

Indian Institute of Science, Bengaluru, India

0000-0002-2252-0141
View Profile

,
Arun Raman

Birla Institute of Technology and Science Pilani, Goa, Goa, India

Birla Institute of Technology and Science Pilani, Goa, Goa, India

0000-0002-5510-2405
View Profile

,
Amey Karkare

Indian Institute of Technology, Kanpur, Kanpur, India

Indian Institute of Technology, Kanpur, Kanpur, India

0000-0002-3664-6490
View Profile

CompEd 2023: Proceedings of the ACM Conference on Global Computing Education Vol 1December 2023Pages 7–14https://doi.org/10.1145/3576882.3617916

Published:05 December 2023Publication History

CompEd 2023: Proceedings of the ACM Conference on Global Computing Education Vol 1

Pages 7–14

ABSTRACT

In an introductory programming (CS1) context, a Refute question asks students for a counter-example which proves that a given code fragment is an incorrect solution for a given task. Such a question can be used as an assessment item to (formatively) develop or (summatively) demonstrate a student's abilities to comprehend the task and the code well enough to recognize a mismatch. These abilities assume greater significance with the emergence of generative AI technologies capable of writing code that is plausible (at least to novice programmers) but not always correct.

Instructors must address three concerns while designing an effective Refute question, each influenced by their specific teaching-learning context: (1) Is the task comprehensible? (2) Is the incorrect code a plausible solution for the task? (3) Is the complexity of finding a counter-example acceptable? While the first concern can often be addressed by reusing tasks from previous code writing questions, addressing the latter concerns may require substantial instructor effort. We therefore investigate whether concerns (2) and (3) can be addressed by buggy student solutions for the corresponding code writing question from a previous course offering. For 6 code writing questions (from a Fall 2015 C programming course), our automated evaluation system logged 13,847 snapshots of executable student code, of which 10,574 were buggy (i.e., they failed at least one instructor-supplied test case). Code selected randomly from this pool rarely addresses these concerns, and manual selection is infeasible. Our paper makes three contributions. First, we propose an automated mechanism to filter this pool to a more manageable number of snapshots from which appropriate code can be selected manually. Second, we evaluate our semi-automated mechanism with respect to concerns (2) and (3) by surveying a diverse set of 56 experienced participants (instructors, tutors, and teaching assistants). Third, we use this mechanism to seed a public repository of Refute questions and provide a template to create additional questions using a public resource (CodeCheck).

References

G.S. Adithi, Akshay Adiga, K. Pavithra, Prajwal P. Vasisht, and Viraj Kumar. 2015. Secure, Offline Feedback to Convey Instructor Intent. In 2015 IEEE Seventh International Conference on Technology for Education (T4E). 105--108.Google ScholarDigital Library
Marzieh Ahmadzadeh, Dave Elliman, and Colin Higgins. 2005. An analysis of patterns of debugging among novice computer science students. In Proceedings of the 10th annual SIGCSE conference on Innovation and technology in computer science education. 84--88.Google ScholarDigital Library
Ibrahim Albluwi. 2019. Plagiarism in Programming Assessments: A Systematic Review. ACM Trans. Comput. Educ. 20, 1, Article 6 (dec 2019), 28 pages. https://doi.org/10.1145/3371156Google ScholarDigital Library
Nabeel Alzahrani and Frank Vahid. 2021. Common Logic Errors for Programming Learners: A Three-decade Literature Survey. In 2021 ASEE Virtual Annual Conference Content Access.Google Scholar
Desai Ankur and Deo Atul. 2022. Introducing Amazon CodeWhisperer, the ML-powered Coding Companion. Retrieved Jan 21, 2023 from https://aws.amazon.com/blogs/machine-learning/introducing-amazon-codewhisperer-the-ml-powered-coding-companion/Google Scholar
Ruven Brooks. 1983. Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies 18, 6 (1983), 543--554. https://doi.org/10.1016/S0020--7373(83)80031--5Google ScholarCross Ref
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.Google Scholar
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, et al. 2021. Evaluating Large Language Models Trained on Code. https://doi.org/10.48550/ARXIV.2107.03374Google ScholarCross Ref
Rajdeep Das, Umair Z Ahmed, Amey Karkare, and Sumit Gulwani. 2016. Prutor: A system for tutoring CS1 and collecting student programs for analysis. arXiv preprint arXiv:1608.03828 (2016).Google Scholar
Michael de Raadt, Richard Watson, and Mark Toleman. 2009. Teaching and Assessing Programming Strategies Explicitly. In Proceedings of the Eleventh Australasian Conference on Computing Education - Volume 95 (Wellington, New Zealand) (ACE '09). 45--54.Google ScholarDigital Library
M Decasse and A-M Emde. 1988. A review of automated debugging systems: Knowledge, strategies and techniques. In Proceedings. [1989] 11th International Conference on Software Engineering. IEEE Computer Society, 162--163.Google ScholarCross Ref
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). 1136--1142. https://doi.org/10.1145/3545945.3569823Google ScholarDigital Library
Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common logic errors made by novice programmers. In Proceedings of the 20th Australasian Computing Education Conference. 83--89.Google ScholarDigital Library
James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference. 10--19.Google Scholar
John D Gould. 1975. Some psychological evidence on how people debug computer programs. International Journal of Man-Machine Studies 7, 2 (1975), 151--182.Google ScholarCross Ref
Sumit Gulwani, Ivan Radivcek, and Florian Zuleger. 2018. Automated Clustering and Program Repair for Introductory Programming Assignments. SIGPLAN Not. 53, 4 (jun 2018), 465--480. https://doi.org/10.1145/3296979.3192387Google ScholarDigital Library
Cay S. Horstmann. [n.,d.]. CodeCheck. Retrieved May 12, 2023 from https://horstmann.com/codecheck/index.htmlGoogle Scholar
Theresia Devi Indriasari, Andrew Luxton-Reilly, and Paul Denny. 2020. A review of peer code review in higher education. ACM Transactions on Computing Education (TOCE) 20, 3 (2020), 1--25.Google ScholarDigital Library
Cruz Izu, Carsten Schulte, Ashish Aggarwal, Quintin Cutts, Rodrigo Duran, Mirela Gutica, Birte Heinemann, Eileen Kraemer, Violetta Lonati, Claudio Mirolo, et al. 2019. Fostering program comprehension in novice programmers-learning activities and learning trajectories. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. 27--52.Google ScholarDigital Library
Matthew C Jadud. 2005. A first look at novice compilation behaviour using BlueJ. Computer Science Education 15, 1 (2005), 25--40.Google ScholarCross Ref
Amey Karkare and Purushottam Kar. 2022. Prutor: an intelligent learning and management system for programming courses. Commun. ACM 65, 11 (2022), 62--64.Google ScholarDigital Library
Irvin R Katz and John R Anderson. 1987. Debugging: An analysis of bug-location strategies. Human-Computer Interaction 3, 4 (1987), 351--399.Google ScholarDigital Library
Marja Kuittinen and Jorma Sajaniemi. 2004. Teaching Roles of Variables in Elementary Programming Courses. SIGCSE Bull. 36, 3 (jun 2004), 57--61.Google ScholarDigital Library
Viraj Kumar. 2021. Refute: An Alternative to 'Explain in Plain English' Questions. In Proceedings of the 17th ACM Conference on International Computing Education Research (Virtual Event, USA) (ICER 2021). 438--440.Google ScholarDigital Library
Viraj Kumar and Arun Raman. 2023. Helping Students Develop a Critical Eye with Refute Questions. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 2. 1181. https://doi.org/10.1145/3545947.3569636Google ScholarDigital Library
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. 2022. Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814 (2022).Google Scholar
Dastyni Loksa and Amy J. Ko. 2016. The Role of Self-Regulation in Programming Problem Solving Process and Success. In Proceedings of the 2016 ACM Conference on International Computing Education Research (Melbourne, VIC, Australia) (ICER '16). 83--91. https://doi.org/10.1145/2960310.2960334Google ScholarDigital Library
Manuel Maarek and Léon McGregor. 2020. Development of a Web Platform for Code Peer-Testing. arXiv preprint arXiv:2008.06102 (2020).Google Scholar
Rifat Sabbir Mansur, Ayaan M Kazerouni, Stephen H Edwards, and Clifford A Shaffer. 2020. Exploring the Bug Investigation Techniques of Intermediate Student Programmers. In Koli Calling'20: Proceedings of the 20th Koli Calling International Conference on Computing Education Research. 1--10.Google Scholar
Renee McCauley, Sue Fitzgerald, Gary Lewandowski, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: a review of the literature from an educational perspective. Computer Science Education 18, 2 (2008), 67--92.Google ScholarCross Ref
Léon McGregor and Manuel Maarek. 2020. Software Testing as Medium for Peer Feedback. In United Kingdom & Ireland Computing Education Research conference. 66--72.Google ScholarDigital Library
Robert C Metzger. 2004. Debugging by thinking: A multidisciplinary approach. Digital Press.Google Scholar
OpenAI. 2023. GPT-4 Technical Report. arxiv: 2303.08774 [cs.CL]Google Scholar
José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated assessment in computer science education: A state-of-the-art review. ACM Transactions on Computing Education (TOCE) 22, 3 (2022), 1--40.Google ScholarDigital Library
Joe Gibbs Politz, Joseph M Collard, Arjun Guha, Kathi Fisler, and Shriram Krishnamurthi. 2016. The sweep: Essential examples for in-flow peer review. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 243--248.Google ScholarDigital Library
James Prather, Raymond Pettit, Brett A. Becker, Paul Denny, Dastyni Loksa, Alani Peters, Zachary Albrecht, and Krista Masci. 2019. BEST PAPER AT SIGCSE 2019 IN THE CS EDUCATION TRACK: First Things First: Providing Metacognitive Scaffolding for Interpreting Problem Prompts. ACM Inroads 10, 2 (apr 2019).Google ScholarDigital Library
James Prather, Raymond Pettit, Kayla McMurry, Alani Peters, John Homer, and Maxine Cohen. 2018. Metacognitive Difficulties Faced by Novice Programmers in Automated Assessment Tools. In Proceedings of the 2018 ACM Conference on International Computing Education Research (ICER '18). 41--50.Google ScholarDigital Library
Peter C Rigby and Christian Bird. 2013. Convergent contemporary software peer review practices. In Proceedings of the 2013 9th joint meeting on foundations of software engineering. 202--212.Google ScholarDigital Library
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27--43.Google ScholarDigital Library
Carsten Schulte. 2008. Block Model: An Educational Model of Program Comprehension as a Tool for a Scholarly Approach to Teaching. In Proceedings of the Fourth International Workshop on Computing Education Research (Sydney, Australia) (ICER '08). 149--160. https://doi.org/10.1145/1404520.1404535Google ScholarDigital Library
Joanna Smith, Joe Tessler, Elliot Kramer, and Calvin Lin. 2012. Using peer review to teach software testing. In Proceedings of the ninth annual international conference on international computing education research. 93--98.Google ScholarDigital Library
Rebecca Smith and Scott Rixner. 2019. The error landscape: Characterizing the mistakes of novice programmers. In Proceedings of the 50th ACM technical symposium on computer science education. 538--544.Google ScholarDigital Library
Peter Sovietov. 2021. Automatic generation of programming exercises. In 2021 1st International Conference on Technology Enhanced Learning in Higher Education (TELE). IEEE, 111--114.Google ScholarCross Ref
James G Spohrer and Elliot Soloway. 1986. Analyzing the high frequency bugs in novice programs. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers. 230--251.Google Scholar
Thomas A. Standish. 1984. An Essay on Software Reuse. IEEE Transactions on Software Engineering SE-10, 5 (1984), 494--497. https://doi.org/10.1109/TSE.1984.5010272Google ScholarDigital Library
Andrew Thangaraj. 2022. Learning from IITM's Data Science Program. In Proceedings of the 15th Annual ACM India Compute Conference. 2--2.Google ScholarDigital Library
Thomas James Tiam-Lee and Kaoru Sumi. 2018. Procedural generation of programming exercises with guides based on the student's emotion. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE.Google ScholarDigital Library
Rebecca Tiarks. 2011. What Programmers Really Do - An Observational Study. Softwaretechnik-Trends 31 (2011).Google Scholar
Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2023). 172--178.Google ScholarDigital Library
Jacqueline Whalley, Amber Settle, and Andrew Luxton-Reilly. 2023. A Think-aloud Study of Novice Debugging. ACM Transactions on Computing Education (2023).Google Scholar
Susan Wiedenbeck and Nancy J. Evans. 1986. BEACONS IN PROGRAM COMPREHENSION. SIGCHI Bull. 18, 2 (oct 1986), 56--57. https://doi.org/10.1145/15683.1044090Google ScholarDigital Library
John Wrenn and Shriram Krishnamurthi. 2019. Executable Examples for Programming Problem Comprehension. In Proceedings of the 2019 ACM Conference on International Computing Education Research (Toronto ON, Canada) (ICER '19). Association for Computing Machinery, New York, NY, USA, 131--139. https://doi.org/10.1145/3291279.3339416Google ScholarDigital Library
Andreas Zeller. 2009. Why programs fail: a guide to systematic debugging. Elsevier.Google ScholarDigital Library
Rui Zhi, Thomas W Price, Nicholas Lytle, Yihuan Dong, and Tiffany Barnes. 2018. Reducing the state space of programming problems through data-driven feature detection. In Educational Data Mining in Computer Science Education (CSEDM) Workshop@ EDM, Vol. 18.Google Scholar

Index Terms

A Bug's New Life: Creating Refute Questions from Filtered CS1 Student Code Snapshots
1. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Student assessment
2. Software and its engineering
  1. Software organization and properties
    1. Software functional properties
      1. Correctness
        Functionality

Recommendations

Including Coding Questions in Video Quizzes for a Flipped CS1
SIGCSE '18: Proceedings of the 49th ACM Technical Symposium on Computer Science Education

In an effort to improve student performance in a flipped classroom environment, this paper explores the impact of including auto-graded coding questions in gate check quizzes associated with videos for a flipped CS1 course. Previous work showed that ...
Read More
Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on Internetware

Bug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
Read More
Pair programming in CS1: overcoming objections to its adoption

In academic year 2005-06, the Bren School considered incorporating pair programming into CS1, primarily because of reports it increased students' satisfaction with the course and improved their performance in it. Though not denying its benefits, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CompEd 2023: Proceedings of the ACM Conference on Global Computing Education Vol 1
December 2023
180 pages
ISBN:9798400700484
DOI:10.1145/3576882
General Chairs:
Venkatesh Choppella
IIIT Hyderabad, India
,
Deepak B. Phatak
IIT Bombay, India
,
Program Chairs:
Andrew Luxton-Reilly
University of Auckland, New Zealand
,
Michelle Craig
University of Toronto, Canada
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 December 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
CS1
assessment
refute questions
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate33of100submissions,33%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 120
  Total Downloads
- Downloads (Last 12 months)120
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Bug's New Life: Creating Refute Questions from Filtered CS1 Student Code Snapshots

CompEd 2023: Proceedings of the ACM Conference on Global Computing Education Vol 1

ABSTRACT

References

Cited By

Index Terms

Recommendations

Including Coding Questions in Video Quizzes for a Flipped CS1

Bug localization via searching crowd-contributed code

Pair programming in CS1: overcoming objections to its adoption