skip to main content
10.1145/3472673.3473958acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

The good, the bad, and the ugly: mining for patterns in student source code

Published:23 August 2021Publication History

ABSTRACT

Research on source code mining has been explored to discover interesting structural regularities, API usage patterns, refactoring opportunities, bugs, crosscutting concerns, code clones and systematic changes. In this paper we present a pattern mining algorithm that uses frequent tree mining to mine for interesting good, bad or ugly coding idioms made by undergraduate students taking an introductory programming course. We do so by looking for patterns that distinguish positive examples, corresponding to the more correct answers to a question, from negative examples, corresponding to solutions that failed the question. We report promising initial results of this algorithm applied to the source code of over 500 students. Even though more work is needed to fine-tune and validate the algorithm further, we hope that it can lead to interesting insights that can eventually be integrated into an intelligent recommendation system to help students learn from their errors.

References

  1. Tatsuya Asai, Kenji Abe, Shinji Kawasoe, Hiroshi Sakamoto, Hiroki Arimura, and Setsuo Arikawa. 2004. Efficient substructure discovery from large semi-structured data. IEICE TRANSACTIONS on Information and Systems, 87, 12 (2004), 2754–2763.Google ScholarGoogle Scholar
  2. Céline Deknop, Simon Baars, Kim Mens, Ana Oprescu, and Johan Fabry. 2019. Clone Detection vs. Pattern Mining: The Battle. In 18th Belgium-Netherlands Software Evolution Workshop (BENEVOL2019). CEUR Workshop Proc. 2605.Google ScholarGoogle Scholar
  3. Guillaume Derval, Anthony Gego, Pierre Reinbold, Benjamin Frantzen, and Peter Van Roy. 2015. Automatic grading of programming exercises in a MOOC using the INGInious platform. European Stakeholder Summit on experiences and best practices in and around MOOCs (EMOOCS’15), 86–91.Google ScholarGoogle Scholar
  4. Neil Fraser. 2015. Ten things we’ve learned from Blockly. In 2015 IEEE Blocks and Beyond Workshop (Blocks and Beyond). 49–50. https://doi.org/10.1109/BLOCKS.2015.7369000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Philip J. Guo. 2013. Online Python Tutor: Embeddable Web-Based Program Visualization for Cs Education. In 44th ACM Technical Symposium on Computer Science Education (SIGCSE ’13). ACM, 579–584. isbn:9781450318686 https://doi.org/10.1145/2445196.2445368 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aída Jiménez, Fernando Berzal, and Juan Carlos Cubero Talavera. 2010. Frequent tree pattern mining: A survey. Intell. Data Anal., 14, 6 (2010), 603–622. https://doi.org/DOI: 10.1007/978-3-319-07821-2_2Google ScholarGoogle ScholarCross RefCross Ref
  7. Michael Kölling, Bruce Quig, Andrew Patterson, and John Rosenberg. 2003. The BlueJ System and its Pedagogy. Computer Science Education, 13, 4 (2003), 249–268. https://doi.org/10.1076/csed.13.4.249.17496 Google ScholarGoogle ScholarCross RefCross Ref
  8. Thomas Lancaster and Fintan Culwin. 2004. A Comparison of Source Code Plagiarism Detection Engines. Computer Science Education, 14, 2 (2004), 101–112. https://doi.org/10.1080/08993400412331363843 Google ScholarGoogle ScholarCross RefCross Ref
  9. Kim Mens and Angela Lozano. 2014. Source Code-Based Recommendation Systems. In Recommendation Systems in Software Engineering, Martin P. Robillard, Walid Maalej, Robert J. Walker, and Thomas Zimmermann (Eds.). Springer Berlin Heidelberg, 93–130. isbn:978-3-642-45135-5 https://doi.org/10.1007/978-3-642-45135-5_5 Google ScholarGoogle ScholarCross RefCross Ref
  10. Siegfried Nijssen and Joost N. Kok. 2005. Multi-class Correlated Pattern Mining. In Knowledge Discovery in Inductive Databases, 4th International Workshop, KDID 2005, Porto, Portugal, October 3, 2005, Revised Selected and Invited Papers, Francesco Bonchi and Jean-François Boulicaut (Eds.) (Lecture Notes in Computer Science, Vol. 3933). Springer, 165–187. https://doi.org/10.1007/11733492_10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hoang Son Pham, Siegfried Nijssen, Kim Mens, Dario Di Nucci, Tim Molderez, Coen De Roover, Johan Fabry, and Vadim Zaytsev. 2019. Mining Patterns in Source Code Using Tree Mining Algorithms. In Discovery Science. Springer. isbn:978-3-030-33778-0 https://doi.org/10.1007/978-3-030-33778-0_35 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lutz Prechelt, Guido Malpohl, and M. Philippsen. 2002. Finding Plagiarisms among a Set of Programs with JPlag. Journal of Universal Computer Science, 8 (2002), 1016–1038. https://doi.org/10.3217/jucs-008-11-1016 Google ScholarGoogle ScholarCross RefCross Ref
  13. Ricardo Alexandre Peixoto Queirós and José Paulo Leal. 2012. PETCHA: A Programming Exercises Teaching Assistant. In Proceedings of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE’12). ACM, 192–197. isbn:9781450312462 https://doi.org/10.1145/2325296.2325344 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The good, the bad, and the ugly: mining for patterns in student source code

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          EASEAI 2021: Proceedings of the 3rd International Workshop on Education through Advanced Software Engineering and Artificial Intelligence
          August 2021
          61 pages
          ISBN:9781450386241
          DOI:10.1145/3472673

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 August 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader