Skip to main content
Log in

A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Plagiarism takes place when we use any person’s work without giving due acknowledgment. There are several fields where the text similarity is involved like web document retrieval, information mining, and searching related articles. Several approaches have been introduced for detecting plagiarism in the text documents based on the syntactic structure of the text, string similarity, fingerprinting, semantic meaning underlying the text, etc. The basic limitation of plagiarism detection systems these days is that they fail to detect tough cases of plagiarism. The proposed plagiarism detection approach is the hybrid of semantic and syntactic similarity between the text documents. This novel approach exploits linguistic information sources non-linearly using the lexical database for finding the relatedness between text documents. The proposed approach uses semantic knowledge to perform cognitive-inspired computing. The framework is capable of detecting intelligent plagiarism cases like a verbatim copy, paraphrasing, rewording in a sentence, and sentence transformation. The approach has been evaluated on the standard PAN-PC-11 dataset. The experiments show that our technique has outperformed other strong baseline techniques in terms of precision, recall, F-measure, and plagiarism detection (PlagDet) score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Kauffman Y, Young MF. Digital plagiarism: an experimental study of the effect of instructional goals and copy-and-paste affordance. Comput Educ. 2015;83:44–56.

    Article  Google Scholar 

  2. Smedley A, Crawford T, Cloete L. An intervention aimed at reducing plagiarism in undergraduate nursing students. Nurse Educ Pract. 2015;15(3):168–73.

    Article  PubMed  Google Scholar 

  3. Eret E, Gokmenoglu T. Plagiarism in higher education: a case study with prospective academicians. Proc Soc Behav Sci. 2010;2(2):3303–7.

    Article  Google Scholar 

  4. Shivakumar N, Garcia-Molina H. SCAM: a copy detection mechanism for digital documents. Stanford: Department of Computer Science, Stanford University: Austin, Texas 1995.

  5. Si A, Leong HV, Lau RWH. Check: a document plagiarism detection system. In: Proceedings of the 1997 ACM symposium on applied computing. ACM: San Jose, California, USA 1997. pp. 70-7.

  6. Balaguer EV. Putting ourselves in SME’s shoes: automatic detection of plagiarism by the WCopyFind tool. In: Proceedings of the 3rd PAN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Bauhaus University Weimar, 2009. pp. 34-5.

  7. Chong MYM. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Thesis report, England: University of Wolverhampton; 2013.

  8. Barrón-Cedeño A, Gupta P, Rosso P. Methods for cross-language plagiarism detection. Knowl-Based Syst. 2013;50:211–7.

    Article  Google Scholar 

  9. Kent CK, Salim N. Web-based cross-language plagiarism detection. In: Computational Intelligence, Modelling and Simulation (CIMSiM), 2010 Second International Conference on. Sydney: IEEE: Indonesia 2010. pp. 199-204.

  10. Potthast M, et al. Cross-language plagiarism detection. Lang Resour Eval. 2011;45(1):45–62.

    Article  Google Scholar 

  11. Menai MEB, Bagais M. APlag: a plagiarism checker for Arabic texts. In: Computer Science & Education (ICCSE), 2011 6th International Conference on. Singapore: IEEE: Singapore 2011. pp. 1379-83.

  12. Butakov S, Scherbinin V. The toolbox for local and global plagiarism detection. Comput Educ. 2009;52(4):781–8.

    Article  Google Scholar 

  13. Jadalla A, Elnagar A. A plagiarism detection system for Arabic text-based documents. Pacific-Asia Workshop on Intelligence and Security Informatics. Heidelberg: Springer Berlin; 2012. pp. 145-53.

  14. Li Y, Bandar ZA, McLean D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng. 2003;15(4):871–82.

    Article  Google Scholar 

  15. Grozea C, Gehl C, Popescu M. ENCOPLOT: pairwise sequence matching in linear time applied to plagiarism detection. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 10–8.

  16. Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99.

    Article  Google Scholar 

  17. Zechner M, Muhr M, Kern R, Granitzer M. External and intrinsic plagiarism detection using vector space models. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Donostia; 2009. p. 47-55.

  18. Basile C, Benedetto D, Caglioti E, Cristadoro G, Esposti MD. A plagiarism detection procedure in three steps: selection, matches and “squares”. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 19–23.

  19. Kent CK, Salim N. Features based text similarity detection. Journal of Computing 2010. 2(1).

  20. Hussein AS. Arabic document similarity analysis using n-grams and singular value decomposition. In: 2015 I.E. 9th International Conference on Research Challenges in Information Science (RCIS). Athens: IEEE, 2015. pp. 445-55

  21. Chanceaux M, Guérin-Dugué A, Lemaire B, Baccino T. A computational cognitive model of information search in textual materials. Cogn Comput. 2014;6(1):1–17.

    Article  Google Scholar 

  22. Ekbal A, Saha S, Choudhary G. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 12th International Conference on. Pune: IEEE: Pune, India 2012. pp. 366-71.

  23. Lin C, Liu D, Pang W, Wang Z. Sherlock: a semi-automatic framework for quiz generation using a hybrid semantic similarity measure. Cogn Comput. 2015;7(6):667–79.

    Article  Google Scholar 

  24. Abdi A, et al. PDLK: plagiarism detection using linguistic knowledge. Expert Syst Appl. 2015;42(22):8936–46.

    Article  Google Scholar 

  25. Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for document fingerprinting. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM: San Diego, California, USA 2003. pp. 76-85

  26. Velásquez JD, et al. DOCODE 3.0 (DOcument COpy DEtector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf Fusion. 2016;27:64–75.

    Article  Google Scholar 

  27. Sánchez-Vega F, et al. Determining and characterizing the reused text for plagiarism detection. Expert Syst Appl. 2013;40(5):1804–13.

    Article  Google Scholar 

  28. Osman AH, et al. An improved plagiarism detection scheme based on semantic role labeling. Appl Soft Comput. 2012;12(5):1493–502.

    Article  Google Scholar 

  29. Paul M, Jamal S. An improved SRL based plagiarism detection technique using sentence ranking. Proc Comput Sci. 2015;46:223–30.

    Article  Google Scholar 

  30. Osman AH, et al. Conceptual similarity and graph-based method for plagiarism detection. J Theor Appl Inf Technol. 2011;32(2):135–45.

    Google Scholar 

  31. Alzahrani S, Salim N. Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman. 2010;1176:1-8.

  32. Medin DL, Goldstone RL, Gentner D. Respects for similarity. Psychol Rev. 1993;100(2):254.

    Article  Google Scholar 

  33. Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41.

    Article  Google Scholar 

  34. Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Suppl 1):D258–61.

    Article  Google Scholar 

  35. Altheide P. Spatial data transfer standard (sdts). In: Encyclopedia of GIS. Springer US: USA 2008. pp. 1087-95.

  36. Li Y, et al. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng. 2006;18(8):1138–50.

    Article  Google Scholar 

  37. Rada R, e a. Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern. 1989;19(1):17–30.

    Article  Google Scholar 

  38. Wu Z, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics: Las Cruces, 1994. pp. 133–8.

  39. Lin D. An information-theoretic definition of similarity. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998). San Francisco: Morgan Kaufmann Publishers Inc; 1998. pp. 296–304.

  40. Lennon M, et al. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83.

    Google Scholar 

  41. Tomasic A, Garcia-Molina H. Query processing and inverted indices in shared: nothing text document information retrieval systems. VLDB J. 1993;2(3):243–76.

    Article  Google Scholar 

  42. Alzahrani SM, Salim N, Abraham A. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C Appl Rev. 2012;42(2):133-49.

  43. Stamatatos E. Plagiarism detection using stopword n-grams. J Assoc Inf Sci Technol. 2011;62(12):2512-27.

  44. Ahsaee MG, Naghibzadeh M, Ehsan Yasrebi Naeini S. Semantic similarity assessment of words using weighted WordNet. Int J Mach Learn Cybern. 2014;5(3):479–90.

    Article  Google Scholar 

  45. Wang S, Qi H, Kong L, Nu C. Combination of VSM and Jaccard coefficient for external plagiarism detection. In: 2013 International Conference on Machine Learning and Cybernetics, vol 4. Tianjin: IEEE: Tianjin, 2013. pp. 1880–85.

  46. Ekbal A, Saha S, and Choudhary S. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 2012 12th International Conference on. Pune: IEEE; 2012. pp. 366–71.

  47. Grman J, Ravas R. Improved implementation for finding text similarities in large collections of data. Proc PAN at CLEF conference in Amsterdam, The Netherlands. 2011;4(4):339–365.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vishal Gupta.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sahi, M., Gupta, V. A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources. Cogn Comput 9, 852–867 (2017). https://doi.org/10.1007/s12559-017-9502-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-017-9502-4

Keywords

Navigation