skip to main content
10.1145/3486001.3486228acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

Source-Code Similarity Measurement: Syntax Tree Fingerprinting for Automated Evaluation

Authors Info & Claims
Published:22 October 2021Publication History

ABSTRACT

A majority of the current automated evaluation tools focus on grading a program based only on functionally testing the outputs. This approach suffers both false positives (i.e. finding errors where there are not any) and false negatives (missing out on actual errors). In this paper, we present a novel system which emulates manual evaluation of programming assignments based on the structure and not the functional output of the program using structural similarity between the given program and a reference solution. We propose an evaluation rubric for scoring structural similarity with respect to a reference solution. We present an ML based approach to map the system predicted scores to the scores computed using the rubric. Empirical evaluation of the system is done on a corpus of Python programs extracted from the popular programming platform, HackerRank, in combination with programming assignments submitted by students undertaking an undergraduate Python programming course. The preliminary results have been encouraging with the errors reported being as low as 12 percent with a deviation of about 3 percent, showing that the automatically generated scores are in high correlation with the instructor assigned scores.

References

  1. Kirsti Ala-Mutka. 2005. A Survey of Automated Assessment Approaches for Programming Assignments. Computer Science Education(2005), 83–102. https://doi.org/10.1080/08993400500150747Google ScholarGoogle Scholar
  2. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2, 3, Article 27 (May 2011), 27 pages. https://doi.org/10.1145/1961189.1961199Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michel Chilowicz and Gilles Roussel. 2009. Syntax tree fingerprinting for source code similarity detection. In 2009 IEEE 17th International Conference on Program Comprehension. 243–247. https://doi.org/10.1109/ICPC.2009.5090050Google ScholarGoogle ScholarCross RefCross Ref
  4. David Gitchell and Nicholas Tran. 1999. Sim: A Utility for Detecting Similarity in Computer Programs. SIGCSE Bull (1999), 266–270. https://doi.org/10.1145/384266.299783Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chao Liu, Chen Chen, Jiawei Han, and Philip S. Yu. 2006. GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 872–881. https://doi.org/10.1145/1150402.1150522Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nikhila K N, Sujit Kumar Chakrabarti, and Manish Gupta. 2021. Discovering Multiple Design Approaches in Programming Assignment Submissions. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (Virtual Event, Republic of Korea) (SAC ’21). Association for Computing Machinery, New York, NY, USA, 1841–1845. https://doi.org/10.1145/3412841.3442140Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 12, null (Nov. 2011), 2825–2830.Google ScholarGoogle Scholar
  8. Lutz Prechelt, Guido Malpohl, and Michael Philippsen. 2002. Finding Plagiarisms among a Set of Programs with JPlag. Journal of Universal Computer Science(2002), 1016–1038.Google ScholarGoogle Scholar
  9. Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. In Proceedings of the 38th International Conference on Software Engineering. Association for Computing Machinery, 1157–1168. https://doi.org/10.1145/2884781.2884877Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing: Local Algorithms for Document Fingerprinting. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 76–85. https://doi.org/10.1145/872757.872770Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gursimran Singh, Shashank Srikant, and Varun Aggarwal. 2016. Question Independent Grading using Machine Learning: The Case of Computer Program Grading. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 263–272. https://doi.org/10.1145/2939672.2939696Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shashank Srikant and Varun Aggarwal. 2014. A system to grade computer programming skills using machine learning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014). https://doi.org/10.1145/2623330.2623377Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tiantian Wang, Xiaohong Su, Yuying Wang, and Peijun Ma. 2007. Semantic Similarity-Based Grading of Student Programs. Inf. Softw. Technol.(2007), 99–107. https://doi.org/10.1016/j.infsof.2006.03.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michael Wise. 1993. String Similarity via Greedy String Tiling and Running Karp –Rabin Matching. Unpublished Basser Department of Computer Science Report (1993).Google ScholarGoogle Scholar
  15. Mengya Zheng, Xingyu Pan, and David Lillis. 2018. CodEX: Source Code Plagiarism Detection Based on Abstract Syntax Tree. In AICS.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems
    October 2021
    170 pages
    ISBN:9781450385947
    DOI:10.1145/3486001

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 22 October 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format