Skip to main content

Plagiarism Detection in Students’ Answers Using FP-Growth Algorithm

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13068))

Abstract

According to statistics, over the past year, the quality of education has fallen due to the pandemic, and the percentage of plagiarism in the work of students has increased. Modern plagiarism detection systems work well with external plagiarism, they allow to weed out works and answers that completely copy someone else’s published ideas. Using natural language processing methods, the proposed algorithm allows not only detecting plagiarism, but also correctly classifies students’ responses by the amount of plagiarism. This research paper implements a two-step plagiarism detection algorithm. In the experiment, the text was converted into a vector form by the GloVe method, and then segmented by K-means and the result was obtained by the FP-Growth unsupervised learning algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bensalem, I., Rosso, P., Chikhi, S.: Intrinsic plagiarism detection using N-gram classes. In: EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1459–1464 (2014). https://doi.org/10.3115/v1/d14-1153

  2. Clough, P., Stevenson, M.: Developing a corpus of plagiarised short answers. In: 31, pp. 527–540 (2005)

    Google Scholar 

  3. El Tahir Ali, A.M., Dahwa Abdulla, H.M., Snášel, V.: Overview and comparison of plagiarism detection tools. In: CEUR Workshop Proceedings, vol. 706, pp. 161–172 (2011). ISSN: 16130073

    Google Scholar 

  4. Foltýnek, T., et al.: Testing of support tools for plagiarism detection. Int. J. Educ. Technol. High. Educ. 17(1), Article no. 46 (2020). https://doi.org/10.1186/s41239-020-00192-4. arXiv: 2002.04279. ISSN: 23659440

  5. Li, Y., Wu, H.: A clustering method based on k-means algorithm. In: Phys. Procedia 25, 1104–1109 (2012). https://doi.org/10.1016/j.phpro.2012.03.206. ISSN: 18753892

  6. Liang, P.: Semi-supervised learning for natural language. In: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, p. 86 (2005). http://hdl.handle.net/1721.1/33296

  7. Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, October 2013. arXiv: 1310.4546. ISSN: 10495258

  8. Pennington, J., Richard, S., Manning, C.: GloVe: global vectors for word representation. Br. J. Neurosurg. 31(6), 682–687 (2017). https://doi.org/10.1080/02688697.2017.1354122. ISSN: 1360046X

  9. Scanlon, P.M., Neumann, D.R.: Internet plagiarism among college students. J. College Stud. Dev. 43(3), 374–385 (2002). ISSN: 08975264

    Google Scholar 

  10. Shafiee, A., Karimi, M.: On the relationship between entropy and information. Phys. Essays 20(3), 487–493 (2007). https://doi.org/10.4006/1.3153419. ISSN: 08361398

  11. Su, Z., et al.: Plagiarism detection using the Levenshtein distance and Smith-Waterman algorithm. In: 3rd International Conference on Innovative Computing Information and Control, ICICIC 2008, pp. 1–3 (2008). https://doi.org/10.1109/ICICIC.2008.422

  12. Sun, Y., Platoš, J.: High-dimensional text clustering by dimensionality reduction and improved density peak. In: Wireless Communications and Mobile Computing 2020 (2020). https://doi.org/10.1155/2020/8881112. ISSN: 15308677

Download references

Acknowledgment

This research is conducted within the framework of the grant num. AP09058174 “Development of language-independent unsupervised methods of semantic analysis of large amounts of text data”.

The work was done with partial support from the Mexican Government through the grant A1-S-47854 of the CONACYT, Mexico and grants 20211784, 20211884, and 20211178 of the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE, Mexico.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nurlybayeva, S., Akhmetov, I., Gelbukh, A., Mussabayev, R. (2021). Plagiarism Detection in Students’ Answers Using FP-Growth Algorithm. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89820-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89819-9

  • Online ISBN: 978-3-030-89820-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics