Skip to main content

Using Machine Learning to Identify Patterns in Learner-Submitted Code for the Purpose of Assessment

  • Conference paper
  • First Online:
Pattern Recognition (MCPR 2023)

Abstract

Programming has become an important skill in today’s world and is taught widely both in traditional and online settings. Instructors need to grade increasing amounts of student work. Unit testing can contribute to the automation of the grading process but it cannot assess the structure or partial correctness of code, which is needed for finely differentiated grading. This paper builds on previous research that investigated machine learning models for determining the correctness of programs from token-based features of source code and found that some such models can be successful in classifying source code with respect to whether it passes unit tests. This paper makes two further contributions. First, these results are scrutinized under conditions of varying similarity between code instances used for model training and testing, for a better understanding of how well the models generalize. It was found that the models do not generalize outside of groups of code instances performing very similar tasks (corresponding to similar coding assignments). Second, selected binary classification models are used as a base for multi-class prediction with two different methods. Both of these exhibit prediction success well above the random baseline, with potential to contribute to automated assessment with multi-valued measures of quality (grading schemes), in contrast to the binary pass/fail measure associated with unit testing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Azcona, D., Arora, P., Hsiao, I.H., Smeaton, A.: user2code2vec: embeddings for profiling students based on distributional representations of source code. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, pp. 86–95. ACM, New York (2019)

    Google Scholar 

  2. Tarcsay, B., Vasić, J., Perez-Tellez, F.: Use of machine learning methods in the assessment of programming assignments. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science, vol. 13502, pp. 151–159. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16270-1_13

  3. Perry, D.M., Kim, D., Samanta, R., Zhang, X.: SemCluster: clustering of imperative programming assignments based on quantitative semantic features. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 860–873. ACM, New York (2019)

    Google Scholar 

  4. Bui, N.D., Yu, Y., Jiang, L.: InferCode: self-supervised learning of code representations by predicting subtrees. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1186–1197. IEEE (2021)

    Google Scholar 

  5. Hegarty-Kelly, E., Mooney, D.A.: Analysis of an automatic grading system within first year computer science programming modules. In Computing Education Practice 2021CEP 2021, pp. 17–20. Association for Computing Machinery, New York (2021)

    Google Scholar 

  6. Jayapati, V.S., Venkitaraman, A.: A comparison of information retrieval techniques for detecting source code plagiarism. arXiv preprint arXiv:1902.02407 (2019)

  7. Chen, H.M., Chen, W.H., Lee, C.C.: An automated assessment system for analysis of coding convention violations in java programming assignments. J. Inf. Sci. Eng. 34(5), 1203–1221 (2018)

    Google Scholar 

  8. Rai, K.K., Gupta, B., Shokeen, P., Chakraborty, P.: Question independent automated code analysis and grading using bag of words and machine learning. In: 2019 International Conference on Computing, Power and Communication Technologies (GUCON), pp. 93–98. IEEE (2019)

    Google Scholar 

  9. Mir, A.M., Latoskinas, E., Proksch, S., Gousios, G.: Type4py: deep similarity learning-based type inference for python. arXiv preprint arXiv:2101.04470 (2021)

  10. Li, H.-Y., et al.: Deepreview: automatic code review using deep multi-instance learning. In: Yang, Q., Zhou, Z.-H., Gong, Z., Zhang, M.-L., Huang, S.-J. (eds.) PAKDD 2019. LNCS (LNAI), vol. 11440, pp. 318–330. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16145-3_25

    Chapter  Google Scholar 

  11. Setoodeh, Z., Moosavi, M.R., Fakhrahmad, M., Bidoki, M.: A proposed model for source code reuse detection in computer programs. Iran. J. Sci. Technol. Trans. Electr. Eng. 45(3), 1001–1014 (2021). https://doi.org/10.1007/s40998-020-00403-8

    Article  Google Scholar 

  12. Liu, X., Wang, S., Wang, P., Wu, D.: Automatic grading of programming assignments: an approach based on formal semantics. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), pp. 126–137. IEEE (2019)

    Google Scholar 

  13. Lee, S., Han, H., Cha, S.K., Son, S.: Montage: a neural network language model-guided JavaScript engine fuzzer. In: 29th USENIX Security Symposium (USENIX Security 20), pp. 2613–2630 (2020)

    Google Scholar 

  14. Combéfis, S.: Automated code assessment for education: review, classification and perspectives on techniques and tools. Software 1(1), 3–30 (2022)

    Article  Google Scholar 

  15. Nayak, S., Agarwal, R., Khatri, S.K.: Automated assessment tools for grading of programming assignments: a review. In: International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2022, pp. 1–4 (2022)

    Google Scholar 

  16. Vimalaraj, H., et al.: Automated programming assignment marking tool. In: IEEE 7th International conference for Convergence in Technology (I2CT), Mumbai, India, 2022, pp. 1–8 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Botond Tarcsay .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tarcsay, B., Perez-Tellez, F., Vasic, J. (2023). Using Machine Learning to Identify Patterns in Learner-Submitted Code for the Purpose of Assessment. In: Rodríguez-González, A.Y., Pérez-Espinosa, H., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-López, J.A. (eds) Pattern Recognition. MCPR 2023. Lecture Notes in Computer Science, vol 13902. Springer, Cham. https://doi.org/10.1007/978-3-031-33783-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33783-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33782-6

  • Online ISBN: 978-3-031-33783-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics