skip to main content
10.1145/3649217.3653607acmconferencesArticle/Chapter ViewAbstractPublication PagesiticseConference Proceedingsconference-collections
research-article
Open access

Iterative Student Program Planning using Transformer-Driven Feedback

Published: 03 July 2024 Publication History

Abstract

Problem planning is a fundamental programming skill, and aids students in decomposing tasks into manageable subtasks. While feedback on plans is beneficial for beginners, providing this in a scalable and timely way is an enormous challenge in large courses.
Recent advances in LLMs raise the prospect of helping here. We utilize LLMs to generate code based on students' plans, and evaluate the code against expert-defined test suites. Students receive feedback on their plans and can refine them.
In this report, we share our experience with the design and implementation of this workflow. This tool was used by 544 students in a CS1 course at an Austrian university. We developed a codebook to evaluate their plans and manually applied it to a sample. We show that LLMs can play a valuable role here. However, we also highlight numerous cautionary aspects of using LLMs in this context, many of which will not be addressed merely by having more powerful models (and indeed may be exacerbated by it).

References

[1]
Vincent A.W.M.M. Aleven and Kenneth R. Koedinger. 2002. An effective metacognitive strategy: Learning by doing and explaining with a computer-based cognitive tutor. Cognitive science (2002). https://doi.org/10.1207/s15516709cog2602_1
[2]
Nicklaus Badyal, Derek Jacoby, and Yvonne Coady. 2023. Intentional Biases in LLM Responses. In IEEE Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON '23). https://doi.org/10.1109/UEMCON59035.2023.10316060
[3]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research (2003).
[4]
Francisco Enrique Vicente Castro and Kathi Fisler. 2016. On the Interplay Between Bottom-Up and Datatype-Driven Program Design. In ACM Conference on International Computing Education Research (SIGCSE '16). https://doi.org/10.1145/2839509.2844574
[5]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv: 2107.03374 [cs.LG]
[6]
Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement (1960).
[7]
Michael de Raadt, Richard Watson, and Mark Toleman. 2009. Teaching and Assessing Programming Strategies Explicitly. In Proceedings of the Eleventh Australasian Conference on Computing Education - Volume 95 (Wellington, New Zealand) (ACE '09). Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 45--54. https://dl.acm.org/doi/10.5555/1862712.1862723
[8]
Paul Denny, James Prather, Brett A. Becker, Zachary Albrecht, Dastyni Loksa, and Raymond Pettit. 2019. A Closer Look at Metacognitive Scaffolding: Solving Test Cases Before Programming. In Koli Calling International Conference on Computing Education Research (Koli Calling '19). https://doi.org/10.1145/3364510.3366170
[9]
Alireza Ebrahimi. 1994. Novice programmer errors: language constructs and plan composition. International Journal of Human-Computer Studies, Vol. 41 (1994), 457--480. https://doi.org/10.1006/ijhc.1994.1069
[10]
James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference (ACE '22). https://doi.org/10.1145/3511861.3511863
[11]
James Finnie-Ansley, Paul Denny, Andrew Luxton-Reilly, Eddie Antonio Santos, James Prather, and Brett A. Becker. 2023. My AI Wants to Know If This Will Be on the Exam: Testing OpenAI's Codex on CS2 Programming Exercises. In Australasian Computing Education Conference (ACE '23). https://doi.org/10.1145/3576123.3576134
[12]
Kathi Fisler and Francisco Enrique Vicente Castro. 2017. Sometimes, Rainfall Accumulates: Talk-Alouds with Novice Functional Programmers. In Proceedings of the 2017 ACM Conference on International Computing Education Research (Tacoma, Washington, USA) (ICER '17). ACM, New York, NY, USA, 12--20. https://doi.org/10.1145/3105726.3106183
[13]
Kathi Fisler, Shriram Krishnamurthi, and Janet Siegmund. 2016. Modernizing Plan-Composition Studies. In ACM Technical Symposium on Computing Science Education. https://doi.org/10.1145/2839509.2844556
[14]
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. PAL: Program-aided Language Models. In International Conference on Machine Learning (ICML '23). https://proceedings.mlr.press/v202/gao23f.html
[15]
Ellen R Girden. 1992. ANOVA: Repeated measures. Sage Publications.
[16]
C. J. Hutto and Eric Gilbert. 2014. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, Vol. 8, 1 (May 2014), 216--225. https://doi.org/10.1609/icwsm.v8i1.14550
[17]
Shriram Krishnamurthi and Kathi Fisler. 2021. Developing Behavioral Concepts of Higher-Order Functions. In ACM Conference on International Computing Education Research. https://doi.org/10.1145/3446871.3469739
[18]
Dastyni Loksa, Lauren Margulieux, Brett A. Becker, Michelle Craig, Paul Denny, Raymond Pettit, and James Prather. 2022. Metacognition and Self-Regulation in Programming Education: Theories and Exemplars of Use. ACM Transactions on Computing Education (2022). https://doi.org/10.1145/3487050
[19]
O. Muller, B. Haberman, and D. Ginat. 2007. Pattern-oriented instruction and its influence on problem decomposition and solution construction. In Proceedings of ITiCSE. ACM, New York, NY, 151--155. https://doi.org/10.1145/1268784.1268830
[20]
James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Petersen, Raymond Pettit, Brent N. Reeves, and Jaromir Savelka. 2023. The Robots Are Here: Navigating the Generative AI Revolution in Computing Education. In "ACM Conference on Innovation and Technology in Computer Science Education - Working Group Reports (ITiCSE-WGR '23). https://doi.org/10.1145/3623762.3633499
[21]
Brent Reeves, Sami Sarsa, James Prather, Paul Denny, Brett A. Becker, Arto Hellas, Bailey Kimmel, Garrett Powell, and Juho Leinonen. 2023. Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations. In ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE '23). https://doi.org/10.1145/3587102.3588805
[22]
Robert S. Rist. 1989. Schema Creation in Programming. Cognitive Science (1989), 389--414. https://doi.org/10.1016/0364-0213(89)90018--9
[23]
Robert S. Rist. 1991. Knowledge Creation and Retrieval in Program Design: A Comparison of Novice and Intermediate Student Programmers. Hum.-Comput. Interact., Vol. 6, 1 (Mar 1991), 1--46. https://doi.org/10.1207/s15327051hci0601_1
[24]
Elijah Rivera, Kathi Fisler, and Shriram Krishnamurthi. 2024. Observations on the Design of Program Planning Notations for Students. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (Portland, OR, USA) (SIGCSE 2024). Association for Computing Machinery, New York, NY, USA, 1133--1139. https://doi.org/10.1145/3626252.3630901
[25]
Elijah Rivera, Shriram Krishnamurthi, and Robert Goldstone. 2022. Plan Composition Using Higher-Order Functions. In ACM Conference on International Computing Education Research. https://doi.org/10.1145/3501385.3543965
[26]
James C. Spohrer and Elliot Soloway. 1989. Simulating Student Programmers. In International Joint Conference on Artificial Intelligence. 543--549. https://doi.org/doi/abs/10.5555/1623755.1623841
[27]
John W. Tukey. 1949. Comparing Individual Means in the Analysis of Variance. Biometrics (1949). http://www.jstor.org/stable/3001913
[28]
Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In ACM CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI '22). https://doi.org/10.1145/3491101.3519665
[29]
Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2023. PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change. arxiv: 2206.10498

Index Terms

  1. Iterative Student Program Planning using Transformer-Driven Feedback

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ITiCSE 2024: Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1
    July 2024
    776 pages
    ISBN:9798400706004
    DOI:10.1145/3649217
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 July 2024

    Check for updates

    Author Tags

    1. automated feedback
    2. llms
    3. program planning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ITiCSE 2024
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 552 of 1,613 submissions, 34%

    Upcoming Conference

    ITiCSE '25
    Innovation and Technology in Computer Science Education
    June 27 - July 2, 2025
    Nijmegen , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 213
      Total Downloads
    • Downloads (Last 12 months)213
    • Downloads (Last 6 weeks)45
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media