Using Machine Learning to Identify Patterns in Learner-Submitted Code for the Purpose of Assessment

Tarcsay, Botond; Perez-Tellez, Fernando; Vasic, Jelena

doi:10.1007/978-3-031-33783-3_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13902))

Included in the following conference series:

Mexican Conference on Pattern Recognition

416 Accesses
1 Citations

Abstract

Programming has become an important skill in today’s world and is taught widely both in traditional and online settings. Instructors need to grade increasing amounts of student work. Unit testing can contribute to the automation of the grading process but it cannot assess the structure or partial correctness of code, which is needed for finely differentiated grading. This paper builds on previous research that investigated machine learning models for determining the correctness of programs from token-based features of source code and found that some such models can be successful in classifying source code with respect to whether it passes unit tests. This paper makes two further contributions. First, these results are scrutinized under conditions of varying similarity between code instances used for model training and testing, for a better understanding of how well the models generalize. It was found that the models do not generalize outside of groups of code instances performing very similar tasks (corresponding to similar coding assignments). Second, selected binary classification models are used as a base for multi-class prediction with two different methods. Both of these exhibit prediction success well above the random baseline, with potential to contribute to automated assessment with multi-valued measures of quality (grading schemes), in contrast to the binary pass/fail measure associated with unit testing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Use of Machine Learning Methods in the Assessment of Programming Assignments

Grading Documentation with Machine Learning

Automatic Grading of Student Code with Similarity Measurement

References

Azcona, D., Arora, P., Hsiao, I.H., Smeaton, A.: user2code2vec: embeddings for profiling students based on distributional representations of source code. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, pp. 86–95. ACM, New York (2019)
Google Scholar
Tarcsay, B., Vasić, J., Perez-Tellez, F.: Use of machine learning methods in the assessment of programming assignments. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science, vol. 13502, pp. 151–159. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16270-1_13
Perry, D.M., Kim, D., Samanta, R., Zhang, X.: SemCluster: clustering of imperative programming assignments based on quantitative semantic features. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 860–873. ACM, New York (2019)
Google Scholar
Bui, N.D., Yu, Y., Jiang, L.: InferCode: self-supervised learning of code representations by predicting subtrees. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1186–1197. IEEE (2021)
Google Scholar
Hegarty-Kelly, E., Mooney, D.A.: Analysis of an automatic grading system within first year computer science programming modules. In Computing Education Practice 2021CEP 2021, pp. 17–20. Association for Computing Machinery, New York (2021)
Google Scholar
Jayapati, V.S., Venkitaraman, A.: A comparison of information retrieval techniques for detecting source code plagiarism. arXiv preprint arXiv:1902.02407 (2019)
Chen, H.M., Chen, W.H., Lee, C.C.: An automated assessment system for analysis of coding convention violations in java programming assignments. J. Inf. Sci. Eng. 34(5), 1203–1221 (2018)
Google Scholar
Rai, K.K., Gupta, B., Shokeen, P., Chakraborty, P.: Question independent automated code analysis and grading using bag of words and machine learning. In: 2019 International Conference on Computing, Power and Communication Technologies (GUCON), pp. 93–98. IEEE (2019)
Google Scholar
Mir, A.M., Latoskinas, E., Proksch, S., Gousios, G.: Type4py: deep similarity learning-based type inference for python. arXiv preprint arXiv:2101.04470 (2021)
Li, H.-Y., et al.: Deepreview: automatic code review using deep multi-instance learning. In: Yang, Q., Zhou, Z.-H., Gong, Z., Zhang, M.-L., Huang, S.-J. (eds.) PAKDD 2019. LNCS (LNAI), vol. 11440, pp. 318–330. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16145-3_25
Chapter Google Scholar
Setoodeh, Z., Moosavi, M.R., Fakhrahmad, M., Bidoki, M.: A proposed model for source code reuse detection in computer programs. Iran. J. Sci. Technol. Trans. Electr. Eng. 45(3), 1001–1014 (2021). https://doi.org/10.1007/s40998-020-00403-8
Article Google Scholar
Liu, X., Wang, S., Wang, P., Wu, D.: Automatic grading of programming assignments: an approach based on formal semantics. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), pp. 126–137. IEEE (2019)
Google Scholar
Lee, S., Han, H., Cha, S.K., Son, S.: Montage: a neural network language model-guided JavaScript engine fuzzer. In: 29th USENIX Security Symposium (USENIX Security 20), pp. 2613–2630 (2020)
Google Scholar
Combéfis, S.: Automated code assessment for education: review, classification and perspectives on techniques and tools. Software 1(1), 3–30 (2022)
Article Google Scholar
Nayak, S., Agarwal, R., Khatri, S.K.: Automated assessment tools for grading of programming assignments: a review. In: International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2022, pp. 1–4 (2022)
Google Scholar
Vimalaraj, H., et al.: Automated programming assignment marking tool. In: IEEE 7th International conference for Convergence in Technology (I2CT), Mumbai, India, 2022, pp. 1–8 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Enterprise Computing and Digital Transformation, Technological University Dublin, Dublin, Ireland
Botond Tarcsay, Fernando Perez-Tellez & Jelena Vasic

Authors

Botond Tarcsay
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Perez-Tellez
View author publications
You can also search for this author in PubMed Google Scholar
Jelena Vasic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Botond Tarcsay .

Editor information

Editors and Affiliations

Centro de Investigación Científica y de Educación Superior de Ensenada, Tepic, Mexico
Ansel Yoan Rodríguez-González
Centro de Investigación Científica y de Educación Superior de Ensenada, Tepic, Mexico
Humberto Pérez-Espinosa
Instituto Nacional de Astrofísica, Óptica y Electrónica, Santa María Tonantzintla, Mexico
José Francisco Martínez-Trinidad
Instituto Nacional de Astrofísica, Óptica y Electrónica, Santa María Tonantzintla, Mexico
Jesús Ariel Carrasco-Ochoa
Autonomous University of Puebla, Puebla, Mexico
José Arturo Olvera-López

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tarcsay, B., Perez-Tellez, F., Vasic, J. (2023). Using Machine Learning to Identify Patterns in Learner-Submitted Code for the Purpose of Assessment. In: Rodríguez-González, A.Y., Pérez-Espinosa, H., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-López, J.A. (eds) Pattern Recognition. MCPR 2023. Lecture Notes in Computer Science, vol 13902. Springer, Cham. https://doi.org/10.1007/978-3-031-33783-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-33783-3_5
Published: 09 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33782-6
Online ISBN: 978-3-031-33783-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)