Skip to main content
Log in

Plagiarism detection in students’ programming assignments based on semantics: multimedia e-learning based smart assessment methodology

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The multimedia-based e-Learning methodology provides virtual classrooms to students. The teacher uploads learning materials, programming assignments and quizzes on university’ Learning Management System (LMS). The students learn lessons from uploaded videos and then solve the given programming tasks and quizzes. The source code plagiarism is a serious threat to academia. However, identifying similar source code fragments between different programming languages is a challenging task. To solve the problem, this paper proposed a new plagiarism detection technique between C++ and Java source codes based on semantics in multimedia-based e-Learning and smart assessment methodology. First, it transforms source codes into tokens to calculate semantic similarity in token by token comparison. After that, it finds semantic similarity in scalar value for the complete source codes written in C++ and Java. To analyse the experiment, we have taken the dataset consists of four (4) case studies of Factorial, Bubble Sort, Binary Search and Stack data structure in both C++ and Java. The entire experiment is done in R Studio with R version 3.4.2. The experimental results show better semantic similarity results for plagiarism detection based on comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abdelrahman YA, Khalid A, Osman IM (2017) A method for arabic documents plagiarism detection. Int J Comput Sci Inf Secur 15(2):79

    Google Scholar 

  2. Alrabaee S et al (2015) Sigma: a semantic integrated graph matching approach for identifying reused functions in binary code. Digit Investig 12:S61–S71

    Article  Google Scholar 

  3. Bakker T (2014) Plagiarism detection in source code. PhD dissertation, Universiteit Leiden, 7, pp 1–35

  4. Bandara U, Wijayrathna G (2012) Detection of source code plagiarism using machine learning approach. Int J Comput Theory Eng 4(5):674

    Article  Google Scholar 

  5. Berry MW, Browne M (2005) Understanding search engines: mathematical modeling and text retrieval. SIAM

  6. Buddrus F, Schödel J (1998) Cappuccino—A C++ to Java translator. In Proceedings of the 1998 ACM symposium on Applied Computing. ACM

  7. Chen X et al (2004) Shared information and program plagiarism detection. IEEE Trans Inf Theory 50(7):1545–1551

    Article  MathSciNet  MATH  Google Scholar 

  8. Cosma G, Joy M. (2006) Source-code plagiarism: a UK academic perspective

  9. Cosma G, Joy M (2012) An approach to source-code plagiarism detection and investigation using latent semantic analysis. IEEE Trans Comput 61(3):379–394

    Article  MathSciNet  MATH  Google Scholar 

  10. de Klerk S, Eggen TJ, Veldkamp BP (2014) A blending of computer-based assessment and performance-based assessment: Multimedia-Based Performance Assessment (MBPA). The introduction of a new method of assessment in Dutch Vocational Education and Training (VET). Cadmo, pp 39–56. doi:https://doi.org/10.3280/CAD2014-001006

  11. Farhan M, Aslam M, Jabbar S, Khalid S (2016) Multimedia based qualitative assessment methodology in eLearning: student teacher engagement analysis. Multimed Tools Appl 77:4909–4923

    Article  Google Scholar 

  12. Farhan M, Aslam M, Jabbar S, Khalid S, Kim M (2017) Real-time imaging-based assessment model for improving teaching performance and student experience in e-learning. J Real-Time Image Proc 13(3):491–504

    Article  Google Scholar 

  13. Farhan M, Jabbar S, Aslam M, Ahmad A, Iqbal MM, Khan M, Martinez-Enriquez AM (2017) A real-time data mining approach for interaction analytics assessment: IoT based student interaction framework. Int J Parallel Prog 12:1–18

    Google Scholar 

  14. Farhan M et al (2018) IoT-based students interaction framework using attention-scoring assessment in eLearning. Futur Gener Comput Syst 79:909–919

    Article  Google Scholar 

  15. Jhi Y-C et al (2015) Program characterization using runtime values and its application to software plagiarism detection. IEEE Trans Softw Eng 41(9):925–943

    Article  Google Scholar 

  16. Kashyap V et al. (2017) Source forager: a search engine for similar source code. arXiv preprint arXiv:1706.02769

  17. Kaur R, Singh S (2014) Clone detection in software source code using operational similarity of statements. ACM SIGSOFT Softw Eng Notes 39(3):1–5

    Article  Google Scholar 

  18. Kawamitsu N et al. (2014) Identifying source code reuse across repositories using LCS-based source code similarity. In Source Code Analysis and Manipulation (SCAM), 2014 I.E. 14th International Working Conference on. IEEE

  19. Kim J et al. (2016) Measuring source code similarity by finding similar subgraph with an incremental genetic algorithm. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference. ACM

  20. Lau RW et al (2014) Recent development in multimedia e-learning technologies. World Wide Web 17(2):189–198

    Article  Google Scholar 

  21. Lazar F-M, Banias O (2014) Clone detection algorithm based on the Abstract Syntax Tree approach. In 2014 I.E. 9th International Symposium on Applied Computational Intelligence and Informatics (SACI). IEEE

  22. Lu Q, Wang Y (2017) Detection technology of malicious code based on semantic. Multimed Tools Appl 76(19):19543–19555

    Article  Google Scholar 

  23. Luo L. et al. (2017) Semantics-based obfuscation-resilient binary code similarity comparison with applications to software and algorithm plagiarism detection. IEEE Trans Softw Eng

  24. Malabarba S, Devanbu P, Stearns A (1999) MoHCA-Java: a tool for C++ to Java conversion support. In Proceedings of the 21st international conference on Software engineering. ACM

  25. Malik KR et al (2016) Big-data: transformation from heterogeneous data to semantically-enriched simplified data. Multimed Tools Appl 75(20):12727–12747

    Article  Google Scholar 

  26. Marshall CZ, Buchanan EM (2017) Latent semantic analysis applied to authorship questions in textual analysis

  27. McGill TJ, Klobas JE, Renzi S (2014) Critical success factors for the continuation of e-learning initiatives. Internet High Educ 22:24–36

    Article  Google Scholar 

  28. Ohno A, Murao H (2011) A two-step in-class source code plagiarism detection method utilizing improved CM algorithm and SIM. Int J Innov Comput Inform Control 7(8):4729–4739

    Google Scholar 

  29. Pawelczak D (2013) Online detection of source-code plagiarism in undergraduate programming courses. In Proceedings of the International Conference on Frontiers in Education: Computer Science and Computer Engineering (FECS). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp)

  30. Ragkhitwetsagul C (2016) Measuring code similarity in large-scaled code Corpora. In 2016 I.E. International Conference on software maintenance and evolution (ICSME). IEEE

  31. Roy CK, Cordy JR (2007) A survey on software clone detection research. Queen’s Sch Comput TR 541(115):64–68

    Google Scholar 

  32. Sajnani H. et al. (2016) SourcererCC: scaling code clone detection to big-code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE

  33. ShanmughaSundaram M, Subramani S (2015) A measurement of similarity to identify identical code clones. Int Arab J Inform Technol 12:735–740

    Google Scholar 

  34. Shirota Y, Chakraborty B (2015) Visual explanation of mathematics in Latent semantic analysis. In 2015 IIAI 4th International Congress on IEEE Advanced Applied Informatics (IIAI-AAI)

  35. Son J-W et al (2013) An application for plagiarized source code detection based on a parse tree kernel. Eng Appl Artif Intell 26(8):1911–1918

    Article  Google Scholar 

  36. Song H-J, Park S-B, Park SY (2015) Computation of program source code similarity by composition of parse tree and call graph. Math Prob Eng. 2015

  37. Stemler SE (2015) Content analysis. Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource

  38. Van Rysselberghe F, Demeyer S (2004) Evaluating clone detection techniques from a refactoring perspective. In 19th International Conference on Automated Software Engineering, 2004. Proceedings. IEEE

  39. White M et al. (2016) Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM

  40. Yang F-P, Jiau HC, Ssu K-F (2014) Beyond plagiarism: an active learning method to analyze causes behind code-similarity. Comput Educ 70:161–172

    Article  Google Scholar 

  41. Yu B, Xu Z-b, C-h L (2008) Latent semantic analysis for text categorization using neural network. Knowl-Based Syst 21(8):900–904

    Article  Google Scholar 

  42. Zhang D (2005) Interactive multimedia-based e-learning: a study of effectiveness. Am J Dist Educ 19(3):149–162

    Article  Google Scholar 

  43. Zhang D et al (2004) Can e-learning replace classroom learning? Commun ACM 47(5):75–79

    Article  Google Scholar 

  44. Zhiyuan Z (2017) Latent semantic analysis

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program (2016QY06X1205, 2016YFB0800605), and the Technology Research and Development Program of Sichuan, China (18DYF2039, 17ZDYF2583).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junfeng Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ullah, F., Wang, J., Farhan, M. et al. Plagiarism detection in students’ programming assignments based on semantics: multimedia e-learning based smart assessment methodology. Multimed Tools Appl 79, 8581–8598 (2020). https://doi.org/10.1007/s11042-018-5827-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5827-6

Keywords

Navigation