skip to main content
10.1145/3408877.3432439acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article

Early Performance Prediction using Interpretable Patterns in Programming Process Data

Published: 05 March 2021 Publication History

Abstract

Instructors have limited time and resources to help struggling students, and these resources should be directed to the students who most need them. To address this, researchers have constructed models that can predict students' final course performance early in a semester. However, many predictive models are limited to static and generic student features (e.g. demographics, GPA), rather than computing-specific evidence that assesses a student's progress in class. Many programming environments now capture complete time-stamped records of students' actions during programming. In this work, we leverage this rich, fine-grained log data to build a model to predict student course outcomes. From the log data, we extract patterns of behaviors that are predictive of students' success using an approach called differential sequence mining. We evaluate our approach on a dataset from 106 students in a block-based, introductory programming course. The patterns extracted from our approach can predict final programming performance with 79% accuracy using only the first programming assignment, outperforming two baseline methods. In addition, we show that the patterns are interpretable and correspond to concrete, effective -- and ineffective -- novice programming behaviors. We also discuss these patterns and their implications for classroom instruction.

References

[1]
Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the eleventh international conference on data engineering. IEEE, 3--14.
[2]
Alireza Ahadi, Raymond Lister, Heikki Haapala, and Arto Vihavainen. 2015. Exploring machine learning methods to automatically identify students in need of assistance. In Proceedings of the eleventh annual International Conference on International Computing Education Research. 121--130.
[3]
Carole Ames and Jennifer Archer. 1988. Achievement goals in the classroom: Students' learning strategies and motivation processes. Journal of educational psychology, Vol. 80, 3 (1988), 260.
[4]
Brett A Becker. 2016. A new metric to quantify repeated compiler errors for novice programmers. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. 296--301.
[5]
Jens Bennedsen and Michael E Caspersen. 2019. Failure rates in introductory programming: 12 years later. ACM Inroads, Vol. 10, 2 (2019), 30--36.
[6]
Paulo Blikstein, Marcelo Worsley, Chris Piech, Mehran Sahami, Steven Cooper, and Daphne Koller. 2014. Programming Pluralism: Using Learning Analytics to Detect Patterns in the Learning of Computer Programming. Journal of the Learning Sciences, Vol. 23, 4 (2014), 561--599. https://doi.org/10.1080/10508406.2014.954750
[7]
Benjamin S Bloom. 1984. The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational researcher, Vol. 13, 6 (1984), 4--16.
[8]
Adam Scott Carter and Christopher David Hundhausen. 2017. Using programming process data to detect differences in students' patterns of programming. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education. 105--110.
[9]
Jason Carter, Prasun Dewan, and Mauro Pichiliani. 2015. Towards incremental separation of surmountable and insurmountable programming difficulties. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education. 241--246.
[10]
Karo Castro-Wunsch, Alireza Ahadi, and Andrew Petersen. 2017. Evaluating neural networks as a method for identifying students in need of assistance. In Proceedings of the 2017 ACM SIGCSE technical symposium on computer science education. 111--116.
[11]
Xianglei Chen. 2013. STEM Attrition: College Students' Paths into and out of STEM Fields. Statistical Analysis Report. NCES 2014-001. National Center for Education Statistics (2013).
[12]
Yihuan Dong, Samiha Marwan, Veronica Catete, Thomas Price, and Tiffany Barnes. 2019. Defining tinkering behavior in open-ended block-based programming assignments. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 1204--1210.
[13]
AF ElGamal. 2013. An educational data mining model for predicting student performance in programming course. International Journal of Computer Applications, Vol. 70, 17 (2013), 22--28.
[14]
Andrew Emerson, Fernando J Rodr'iguez, Bradford Mott, Andy Smith, Wookhee Min, Kristy Elizabeth Boyer, Cody Smith, Eric Wiebe, and James Lester. 2019. Predicting early and often: Predictive student modeling for block-based programming environments. In Proceedings of The 12th International Conference on Educational Data Mining (EDM 2019), Vol. 39. ERIC, 48.
[15]
Eibe Frank and Ian H Witten. 1999. Making better use of global discretization. In 16th International Conference on Machine Learning (ICML 99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 115--123.
[16]
Kaushik VSN Ghantasala, Raeed H Chowdhury, Uday Guntupalli, Jason R Hagerty, Randy H Moss, Ryan K Rader, and William V Stoecker. 2013. The Median Split Algorithm for Detection of Critical Melanoma Color Features. In VISAPP (1). 492--495.
[17]
Shuchi Grover, Satabdi Basu, Marie Bienkowski, Michael Eagle, Nicholas Diana, and John Stamper. 2017. A framework for using hypothesis-driven approaches to support data-driven learning analytics in measuring computational thinking in block-based programming environments. ACM Transactions on Computing Education (TOCE), Vol. 17, 3 (2017), 1--25.
[18]
Allyson F Hadwin, John C Nesbit, Dianne Jamieson-Noel, Jillianne Code, and Philip H Winne. 2007. Examining trace data to explore self-regulated learning. Metacognition and Learning, Vol. 2, 2--3 (2007), 107--124.
[19]
Roya Hosseini, Arto Vihavainen, and Peter Brusilovsky. 2014. Exploring problem solving paths in a Java programming course. (2014).
[20]
Pei-Lun Hsu, Robert Lai, and CC Chiu. 2003. The hybrid of association rule algorithms and genetic algorithms for tree induction: an example of predicting the student course performance. Expert Systems with Applications, Vol. 25, 1 (2003), 51--62.
[21]
Qian Hu and Huzefa Rangwala. 2019. Reliable deep grade prediction with uncertainty estimation. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge. 76--85.
[22]
Matthew C Jadud. 2006. Methods and tools for exploring novice compilation behaviour. In Proceedings of the second international workshop on Computing education research. 73--84.
[23]
John S Kinnebrew, Kirk M Loretz, and Gautam Biswas. 2013. A contextualized, differential sequence mining method to derive students' learning behavior patterns. JEDM| Journal of Educational Data Mining, Vol. 5, 1 (2013), 190--219.
[24]
Juho Leinonen, Leo Lepp"anen, Petri Ihantola, and Arto Hellas. 2017. Comparison of time metrics in programming. In Proceedings of the 2017 acm conference on international computing education research. 200--208.
[25]
Soohyun Nam Liao, Daniel Zingaro, Kevin Thai, Christine Alvarado, William G Griswold, and Leo Porter. 2019. A robust machine learning technique to predict low-performing students. ACM Transactions on Computing Education (TOCE), Vol. 19, 3 (2019), 1--19.
[26]
David Lo, Siau-Cheng Khoo, and Chao Liu. 2008. Efficient mining of recurrent rules from a sequence database. In International Conference on Database Systems for Advanced Applications. Springer, 67--83.
[27]
Ye Mao, Samiha Marwan, Thomas W Price, Tiffany Barnes, and Min Chi. 2020. What Time is It? Student Modeling Needs to Know. In In proceedings of the 13th International Conference on Educational Data Mining .
[28]
Antonija Mitrovic, Stellan Ohlsson, and Devon K Barrow. 2013. The effect of positive feedback in a constraint-based intelligent tutoring system. Computers & Education, Vol. 60, 1 (2013), 264--272.
[29]
Laurie Murphy and Lynda Thomas. 2008. Dangers of a fixed mindset: implications of self-theories research for computer science education. In Proceedings of the 13th annual conference on Innovation and technology in computer science education. 271--275.
[30]
David N Perkins, Chris Hancock, Renee Hobbs, Fay Martin, and Rebecca Simmons. 1986. Conditions of learning in novice programmers. Journal of Educational Computing Research, Vol. 2, 1 (1986), 37--55.
[31]
Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. In Advances in neural information processing systems. 505--513.
[32]
Chris Piech, Mehran Sahami, Daphne Koller, Steve Cooper, and Paulo Blikstein. 2012. Modeling how students learn to program. In Proceedings of the 43rd ACM technical symposium on Computer Science Education. 153--160.
[33]
Leo Porter, Daniel Zingaro, and Raymond Lister. 2014. Predicting student success using fine grain clicker data. In Proceedings of the tenth annual conference on International computing education research. 51--58.
[34]
Thomas W Price, Yihuan Dong, and Dragan Lipovac. 2017a. iSnap: towards intelligent tutoring in novice programming environments. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education. 483--488.
[35]
Thomas W Price, David Hovemeyer, Kelly Rivers, Ge Gao, Austin Cory Bart, Ayaan M Kazerouni, Brett A Becker, Andrew Petersen, Luke Gusukuma, Stephen H Edwards, et al. 2020. Progsnap2: A flexible format for programming process data. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education. 356--362.
[36]
Thomas W Price, Rui Zhi, and Tiffany Barnes. 2017b. Hint generation under uncertainty: The effect of hint quality on help-seeking behavior. In International Conference on Artificial Intelligence in Education. Springer, 311--322.
[37]
Kelly Rivers and Kenneth R Koedinger. 2013. Automatic generation of programming feedback: A data-driven approach. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), Vol. 50.
[38]
Cristóbal Romero, Sebastián Ventura, Pedro G Espejo, and César Hervás. 2008. Data mining algorithms to classify students. In Educational data mining 2008 .
[39]
Wengran Wang, Yudong Rao, Yang Shi, Alexandra Milliken, Chris Martens, Tiffany Barnes, and Thomas W Price. [n.d.]. Comparing Feature Engineering Approaches to Predict Complex Programming Behaviors. ( [n.,d.]).
[40]
Christopher Watson and Frederick WB Li. 2014. Failure rates in introductory programming revisited. In Proceedings of the 2014 conference on Innovation & technology in computer science education. 39--44.
[41]
Christopher Watson, Frederick WB Li, and Jamie L Godwin. 2013. Predicting performance in an introductory programming course by logging and analyzing student programming behavior. In 2013 IEEE 13th international conference on advanced learning technologies. IEEE, 319--323.
[42]
Jeannette M Wing. 2006. Computational thinking. Commun. ACM, Vol. 49, 3 (2006), 33--35.
[43]
Nong Ye and Gavriel Salvendy. 1996. Expert-novice knowledge of computer programming at different levels of abstraction. Ergonomics, Vol. 39, 3 (1996), 461--481.
[44]
Rui Zhi, Min Chi, Tiffany Barnes, and Thomas W Price. 2019. Evaluating the Effectiveness of Parsons Problems for Block-based Programming. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 51--59.

Cited By

View all
  • (2025)Predicting Long-Term Student Outcomes from Short-Term EdTech Log DataProceedings of the 15th International Learning Analytics and Knowledge Conference10.1145/3706468.3706552(631-641)Online publication date: 3-Mar-2025
  • (2025)Enhancing Student Performance Prediction In CS1 Via In-Class CodingProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 110.1145/3641554.3701820(492-498)Online publication date: 12-Feb-2025
  • (2025)In-class Coding Exercises As A Mechanism To Inform Early Intervention In Programming CoursesProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 110.1145/3641554.3701802(499-505)Online publication date: 12-Feb-2025
  • Show More Cited By

Index Terms

  1. Early Performance Prediction using Interpretable Patterns in Programming Process Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education
    March 2021
    1454 pages
    ISBN:9781450380621
    DOI:10.1145/3408877
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 March 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. model interpretation
    2. sequential pattern mining
    3. student performance prediction
    4. student programming behavior

    Qualifiers

    • Research-article

    Conference

    SIGCSE '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,787 of 5,146 submissions, 35%

    Upcoming Conference

    SIGCSE TS 2025
    The 56th ACM Technical Symposium on Computer Science Education
    February 26 - March 1, 2025
    Pittsburgh , PA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)71
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Predicting Long-Term Student Outcomes from Short-Term EdTech Log DataProceedings of the 15th International Learning Analytics and Knowledge Conference10.1145/3706468.3706552(631-641)Online publication date: 3-Mar-2025
    • (2025)Enhancing Student Performance Prediction In CS1 Via In-Class CodingProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 110.1145/3641554.3701820(492-498)Online publication date: 12-Feb-2025
    • (2025)In-class Coding Exercises As A Mechanism To Inform Early Intervention In Programming CoursesProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 110.1145/3641554.3701802(499-505)Online publication date: 12-Feb-2025
    • (2025)LGS-KT: Integrating logical and grammatical skills for effective programming knowledge tracingNeural Networks10.1016/j.neunet.2025.107164185(107164)Online publication date: May-2025
    • (2024)Scaffolding Novices: Analyzing When and How Parsons Problems Impact Novice Programming in an Integrated Science AssignmentProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671110(42-54)Online publication date: 12-Aug-2024
    • (2024)Where's the Data? Finding and Reusing Datasets in Computing EducationWorking Group Reports on 2023 ACM Conference on Global Computing Education10.1145/3598579.3689378(31-60)Online publication date: 23-Sep-2024
    • (2023)HOPE: Human-Centric Off-Policy Evaluation for E-Learning and HealthcareProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598804(1504-1513)Online publication date: 30-May-2023
    • (2023)Learning Analytics in the Era of Large Language ModelsAnalytics10.3390/analytics20400462:4(877-898)Online publication date: 16-Nov-2023
    • (2023)Assessing Student Performance with Multi-granularity Attention from Online Classroom DialogueProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615143(3798-3802)Online publication date: 21-Oct-2023
    • (2023)Exploring Novices' Struggle and Progress During Programming Through Data-Driven Detectors and Think-Aloud Protocols2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL-HCC57772.2023.00029(179-183)Online publication date: 3-Oct-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media