research-article

Open access

Iterative Student Program Planning using Transformer-Driven Feedback

Authors:

Alexander Steinmaurer,

Shriram KrishnamurthiAuthors Info & Claims

ITiCSE 2024: Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1

Pages 45 - 51

https://doi.org/10.1145/3649217.3653607

Published: 03 July 2024 Publication History

Abstract

Problem planning is a fundamental programming skill, and aids students in decomposing tasks into manageable subtasks. While feedback on plans is beneficial for beginners, providing this in a scalable and timely way is an enormous challenge in large courses.

Recent advances in LLMs raise the prospect of helping here. We utilize LLMs to generate code based on students' plans, and evaluate the code against expert-defined test suites. Students receive feedback on their plans and can refine them.

In this report, we share our experience with the design and implementation of this workflow. This tool was used by 544 students in a CS1 course at an Austrian university. We developed a codebook to evaluate their plans and manually applied it to a sample. We show that LLMs can play a valuable role here. However, we also highlight numerous cautionary aspects of using LLMs in this context, many of which will not be addressed merely by having more powerful models (and indeed may be exacerbated by it).

References

[1]

Vincent A.W.M.M. Aleven and Kenneth R. Koedinger. 2002. An effective metacognitive strategy: Learning by doing and explaining with a computer-based cognitive tutor. Cognitive science (2002). https://doi.org/10.1207/s15516709cog2602_1

[2]

Nicklaus Badyal, Derek Jacoby, and Yvonne Coady. 2023. Intentional Biases in LLM Responses. In IEEE Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON '23). https://doi.org/10.1109/UEMCON59035.2023.10316060

[3]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research (2003).

[4]

Francisco Enrique Vicente Castro and Kathi Fisler. 2016. On the Interplay Between Bottom-Up and Datatype-Driven Program Design. In ACM Conference on International Computing Education Research (SIGCSE '16). https://doi.org/10.1145/2839509.2844574

Digital Library

[5]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv: 2107.03374 [cs.LG]

[6]

Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement (1960).

[7]

Michael de Raadt, Richard Watson, and Mark Toleman. 2009. Teaching and Assessing Programming Strategies Explicitly. In Proceedings of the Eleventh Australasian Conference on Computing Education - Volume 95 (Wellington, New Zealand) (ACE '09). Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 45--54. https://dl.acm.org/doi/10.5555/1862712.1862723

Digital Library

[8]

Paul Denny, James Prather, Brett A. Becker, Zachary Albrecht, Dastyni Loksa, and Raymond Pettit. 2019. A Closer Look at Metacognitive Scaffolding: Solving Test Cases Before Programming. In Koli Calling International Conference on Computing Education Research (Koli Calling '19). https://doi.org/10.1145/3364510.3366170

Digital Library

[9]

Alireza Ebrahimi. 1994. Novice programmer errors: language constructs and plan composition. International Journal of Human-Computer Studies, Vol. 41 (1994), 457--480. https://doi.org/10.1006/ijhc.1994.1069

Digital Library

[10]

James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference (ACE '22). https://doi.org/10.1145/3511861.3511863

Digital Library

[11]

James Finnie-Ansley, Paul Denny, Andrew Luxton-Reilly, Eddie Antonio Santos, James Prather, and Brett A. Becker. 2023. My AI Wants to Know If This Will Be on the Exam: Testing OpenAI's Codex on CS2 Programming Exercises. In Australasian Computing Education Conference (ACE '23). https://doi.org/10.1145/3576123.3576134

Digital Library

[12]

Kathi Fisler and Francisco Enrique Vicente Castro. 2017. Sometimes, Rainfall Accumulates: Talk-Alouds with Novice Functional Programmers. In Proceedings of the 2017 ACM Conference on International Computing Education Research (Tacoma, Washington, USA) (ICER '17). ACM, New York, NY, USA, 12--20. https://doi.org/10.1145/3105726.3106183

Digital Library

[13]

Kathi Fisler, Shriram Krishnamurthi, and Janet Siegmund. 2016. Modernizing Plan-Composition Studies. In ACM Technical Symposium on Computing Science Education. https://doi.org/10.1145/2839509.2844556

Digital Library

[14]

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. PAL: Program-aided Language Models. In International Conference on Machine Learning (ICML '23). https://proceedings.mlr.press/v202/gao23f.html

[15]

Ellen R Girden. 1992. ANOVA: Repeated measures. Sage Publications.

[16]

C. J. Hutto and Eric Gilbert. 2014. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, Vol. 8, 1 (May 2014), 216--225. https://doi.org/10.1609/icwsm.v8i1.14550

[17]

Shriram Krishnamurthi and Kathi Fisler. 2021. Developing Behavioral Concepts of Higher-Order Functions. In ACM Conference on International Computing Education Research. https://doi.org/10.1145/3446871.3469739

Digital Library

[18]

Dastyni Loksa, Lauren Margulieux, Brett A. Becker, Michelle Craig, Paul Denny, Raymond Pettit, and James Prather. 2022. Metacognition and Self-Regulation in Programming Education: Theories and Exemplars of Use. ACM Transactions on Computing Education (2022). https://doi.org/10.1145/3487050

Digital Library

[19]

O. Muller, B. Haberman, and D. Ginat. 2007. Pattern-oriented instruction and its influence on problem decomposition and solution construction. In Proceedings of ITiCSE. ACM, New York, NY, 151--155. https://doi.org/10.1145/1268784.1268830

Digital Library

[20]

James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Petersen, Raymond Pettit, Brent N. Reeves, and Jaromir Savelka. 2023. The Robots Are Here: Navigating the Generative AI Revolution in Computing Education. In "ACM Conference on Innovation and Technology in Computer Science Education - Working Group Reports (ITiCSE-WGR '23). https://doi.org/10.1145/3623762.3633499

Digital Library

[21]

Brent Reeves, Sami Sarsa, James Prather, Paul Denny, Brett A. Becker, Arto Hellas, Bailey Kimmel, Garrett Powell, and Juho Leinonen. 2023. Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations. In ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE '23). https://doi.org/10.1145/3587102.3588805

Digital Library

[22]

Robert S. Rist. 1989. Schema Creation in Programming. Cognitive Science (1989), 389--414. https://doi.org/10.1016/0364-0213(89)90018--9

[23]

Robert S. Rist. 1991. Knowledge Creation and Retrieval in Program Design: A Comparison of Novice and Intermediate Student Programmers. Hum.-Comput. Interact., Vol. 6, 1 (Mar 1991), 1--46. https://doi.org/10.1207/s15327051hci0601_1

Digital Library

[24]

Elijah Rivera, Kathi Fisler, and Shriram Krishnamurthi. 2024. Observations on the Design of Program Planning Notations for Students. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (Portland, OR, USA) (SIGCSE 2024). Association for Computing Machinery, New York, NY, USA, 1133--1139. https://doi.org/10.1145/3626252.3630901

Digital Library

[25]

Elijah Rivera, Shriram Krishnamurthi, and Robert Goldstone. 2022. Plan Composition Using Higher-Order Functions. In ACM Conference on International Computing Education Research. https://doi.org/10.1145/3501385.3543965

Digital Library

[26]

James C. Spohrer and Elliot Soloway. 1989. Simulating Student Programmers. In International Joint Conference on Artificial Intelligence. 543--549. https://doi.org/doi/abs/10.5555/1623755.1623841

[27]

John W. Tukey. 1949. Comparing Individual Means in the Analysis of Variance. Biometrics (1949). http://www.jstor.org/stable/3001913

[28]

Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In ACM CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI '22). https://doi.org/10.1145/3491101.3519665

Digital Library

[29]

Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2023. PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change. arxiv: 2206.10498

Index Terms

Iterative Student Program Planning using Transformer-Driven Feedback
1. Applied computing
  1. Education

Recommendations

Observations on the Design of Program Planning Notations for Students
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

Program planning is the process of splitting a problem description into subtasks that can be solved independently, then composed into a solution. While much has been written about planning since the 1980s, little research looks at modern contexts such as ...
Program and evaluation planning light: planning in the real world

Although there are many high-quality models for program and evaluation planning, these models are often too intensive to be used in situations when time and resources are scarce. Additionally, there is little added value in using an elaborate and ...
Motion-Driven Action-Based Planning
ICTAI '13: Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence

Achievement of robotic goals generally needs both plan synthesis and plan execution through physical motions. Costs of actions in robotic tasks are generally motiondependent. Generally there are many action-based plans for achieving a goal and usually ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ITiCSE 2024: Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1

July 2024

776 pages

ISBN:9798400706004

DOI:10.1145/3649217

General Chairs:
Mattia Monga
University of Milan, Italy
,
Violetta Lonati
University of Milan, Italy
,
Erik Barendsen
Radboud University, The Netherlands
,
Program Chairs:
Judithe Sheard
Monash University, Australia
,
James Paterson
Glasgow Caledonian University, Scotland

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCSE: ACM Special Interest Group on Computer Science Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

ITiCSE 2024

Sponsor:

SIGCSE

ITiCSE 2024: Innovation and Technology in Computer Science Education

July 8 - 10, 2024

Milan, Italy

Acceptance Rates

Overall Acceptance Rate 552 of 1,613 submissions, 34%

Upcoming Conference

ITiCSE '25

Sponsor:
sigcse

Innovation and Technology in Computer Science Education

June 27 - July 2, 2025

Nijmegen , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
213
Total Downloads

Downloads (Last 12 months)213
Downloads (Last 6 weeks)45

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten