research-article

Source-Code Similarity Measurement: Syntax Tree Fingerprinting for Automated Evaluation

Authors:
Arjun Verma

International Institute of Information Technology Bangalore, India

International Institute of Information Technology Bangalore, India
View Profile

,
Prateksha Udhayanan

International Institute of Information Technology Bangalore, India

International Institute of Information Technology Bangalore, India
View Profile

,
Rahul Murali Shankar

International Institute of Information Technology Bangalore, India

International Institute of Information Technology Bangalore, India
View Profile

,
Nikhila KN

International Institute of Information Technology Bangalore, India

International Institute of Information Technology Bangalore, India
View Profile

,
Sujit Kumar Chakrabarti

International Institute of Information Technology Bangalore, India

International Institute of Information Technology Bangalore, India
View Profile

AIMLSystems '21: Proceedings of the First International Conference on AI-ML SystemsOctober 2021Article No.: 8Pages 1–7https://doi.org/10.1145/3486001.3486228

Published:22 October 2021Publication History

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems

Pages 1–7

ABSTRACT

A majority of the current automated evaluation tools focus on grading a program based only on functionally testing the outputs. This approach suffers both false positives (i.e. finding errors where there are not any) and false negatives (missing out on actual errors). In this paper, we present a novel system which emulates manual evaluation of programming assignments based on the structure and not the functional output of the program using structural similarity between the given program and a reference solution. We propose an evaluation rubric for scoring structural similarity with respect to a reference solution. We present an ML based approach to map the system predicted scores to the scores computed using the rubric. Empirical evaluation of the system is done on a corpus of Python programs extracted from the popular programming platform, HackerRank, in combination with programming assignments submitted by students undertaking an undergraduate Python programming course. The preliminary results have been encouraging with the errors reported being as low as 12 percent with a deviation of about 3 percent, showing that the automatically generated scores are in high correlation with the instructor assigned scores.

References

Kirsti Ala-Mutka. 2005. A Survey of Automated Assessment Approaches for Programming Assignments. Computer Science Education(2005), 83–102. https://doi.org/10.1080/08993400500150747Google Scholar
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2, 3, Article 27 (May 2011), 27 pages. https://doi.org/10.1145/1961189.1961199Google ScholarDigital Library
Michel Chilowicz and Gilles Roussel. 2009. Syntax tree fingerprinting for source code similarity detection. In 2009 IEEE 17th International Conference on Program Comprehension. 243–247. https://doi.org/10.1109/ICPC.2009.5090050Google ScholarCross Ref
David Gitchell and Nicholas Tran. 1999. Sim: A Utility for Detecting Similarity in Computer Programs. SIGCSE Bull (1999), 266–270. https://doi.org/10.1145/384266.299783Google ScholarDigital Library
Chao Liu, Chen Chen, Jiawei Han, and Philip S. Yu. 2006. GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 872–881. https://doi.org/10.1145/1150402.1150522Google ScholarDigital Library
Nikhila K N, Sujit Kumar Chakrabarti, and Manish Gupta. 2021. Discovering Multiple Design Approaches in Programming Assignment Submissions. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (Virtual Event, Republic of Korea) (SAC ’21). Association for Computing Machinery, New York, NY, USA, 1841–1845. https://doi.org/10.1145/3412841.3442140Google ScholarDigital Library
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 12, null (Nov. 2011), 2825–2830.Google Scholar
Lutz Prechelt, Guido Malpohl, and Michael Philippsen. 2002. Finding Plagiarisms among a Set of Programs with JPlag. Journal of Universal Computer Science(2002), 1016–1038.Google Scholar
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. In Proceedings of the 38th International Conference on Software Engineering. Association for Computing Machinery, 1157–1168. https://doi.org/10.1145/2884781.2884877Google ScholarDigital Library
Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing: Local Algorithms for Document Fingerprinting. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 76–85. https://doi.org/10.1145/872757.872770Google ScholarDigital Library
Gursimran Singh, Shashank Srikant, and Varun Aggarwal. 2016. Question Independent Grading using Machine Learning: The Case of Computer Program Grading. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 263–272. https://doi.org/10.1145/2939672.2939696Google ScholarDigital Library
Shashank Srikant and Varun Aggarwal. 2014. A system to grade computer programming skills using machine learning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014). https://doi.org/10.1145/2623330.2623377Google ScholarDigital Library
Tiantian Wang, Xiaohong Su, Yuying Wang, and Peijun Ma. 2007. Semantic Similarity-Based Grading of Student Programs. Inf. Softw. Technol.(2007), 99–107. https://doi.org/10.1016/j.infsof.2006.03.001Google ScholarDigital Library
Michael Wise. 1993. String Similarity via Greedy String Tiling and Running Karp –Rabin Matching. Unpublished Basser Department of Computer Science Report (1993).Google Scholar
Mengya Zheng, Xingyu Pan, and David Lillis. 2018. CodEX: Source Code Plagiarism Detection Based on Abstract Syntax Tree. In AICS.Google Scholar

Recommendations

Discovering multiple design approaches in programming assignment submissions
SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

In this paper, we present a novel approach of automated evaluation of programming assignments (AEPA) the highlight of which is that it automatically identifies multiple solution approaches to the programming question from the set of submitted solutions. ...
Read More
LetGrade: An Automated Grading System for Programming Assignments
Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium
Abstract
Manually grading programming assignments is time consuming and tedious, especially if they are incorrect and incomplete. Most existing automated grading systems use testing or program analysis. These systems rely on a single reference solution and ...
Read More
Analysis of Automated Evaluation for Multi-document Summarization Using Content-Based Similarity
ICDS '08: Proceedings of the Second International Conference on Digital Society

We introduce an automated evaluation method based on content similarity, and construct a vector space of words, on which we compute cosine similarity of automated summaries and human summaries. The method is tested on DUC 2005 data, and produces ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems
October 2021
170 pages
ISBN:9781450385947
DOI:10.1145/3486001

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Automated Evaluation
Evaluation Rubric
Program Structural Similarity
Syntax Tree Fingerprinting
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 219
  Total Downloads
- Downloads (Last 12 months)98
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Source-Code Similarity Measurement: Syntax Tree Fingerprinting for Automated Evaluation

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems

ABSTRACT

References

Cited By

Recommendations

Discovering multiple design approaches in programming assignment submissions

LetGrade: An Automated Grading System for Programming Assignments

Analysis of Automated Evaluation for Multi-document Summarization Using Content-Based Similarity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Source-Code Similarity Measurement: Syntax Tree Fingerprinting for Automated Evaluation

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems

ABSTRACT

References

Cited By

Recommendations

Discovering multiple design approaches in programming assignment submissions

LetGrade: An Automated Grading System for Programming Assignments

Analysis of Automated Evaluation for Multi-document Summarization Using Content-Based Similarity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media