Skip to main content

A Graph Based Automatic Plagiarism Detection Technique to Handle Artificial Word Reordering and Paraphrasing

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

Most of the plagiarism detection techniques are based on either string based matching or semantic matching of adjacent strings. However, due to the use of artificial word re-ordering and paraphrasing, the detection of plagiarism has become a challenging task of significant interest. To solve this issue, we concentrate on identification of overlapping adjacent plagiarized word patterns and overlapping non-adjacent/reordered plagiarized word patterns from target document(s). Here the main aim is to capture the simple cases and the complex cases (i.e., artificial word reordering and/or paraphrasing) of plagiarism in the target document. For this first of all we identify the relation between all overlapping word pairs with the help of controlled closeness centrality and semantic similarity. Next, to extract the plagiarized word patterns, we introduce the use of minimum weighted bipartite clique covers. We use the plagiarized word patterns in the identification of plagiarized texts from the target document. Our experimental results on publicly available and annotated dataset like: ‘PAN 2012 plagiarism detection dataset’ and ‘Student answer related plagiarism dataset’ shows that it performs better than state-of-arts systems in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barron-Cedeno, A., Vila, M., Martí, M.A., Rosso, P.: Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection. Computational Linguistics. MIT Press (2013), doi: 10.1162/COLI_a_00153.

    Google Scholar 

  2. IEEE. 2008. A Plagiarism FAQ, http://www.ieee.org/publicationsstandards/publications/rights/plagiarismFAQ.html (last accessed November 25, 2012)

  3. Bhagat, R., Hovy, E.: What Is a Paraphrase? Computational Linguistics. MIT Press (2013), doi:10.1162/COLI_a_00166.

    Google Scholar 

  4. Bouma, G.: Normalized (Pointwise) Mutual Information in Collocation Extraction. In: Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology, pp. 31–40 (2009)

    Google Scholar 

  5. Clough, P., Stevenson, M.: Developing a corpus of plagiarised short answers. In: Language Resources and Evaluation, LREC 2010, vol. 2010 (2009)

    Google Scholar 

  6. Kong, L., Qi, H., Wang, S., Du, C., Wang, S., Han, Y.: Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection—Notebook for PAN at CLEF 2012, http://www.clef-initiative.eu/publication/working-notes , ISBN 978-88-904810-3-1

  7. Suchomel, Š., Kasprzak, J., Brandejs, M.: Three Way Search Engine Queries with Multi-feature Document Comparison for Plagiarism Detection—Notebook for PAN at CLEF (2012). In: Forner et al [6], http://www.clef-initiative.eu/publication/working-notes , ISBN 978-88-904810-3-1

  8. Grozea, C., Popescu, M.: Encoplot - Tuned for High Recall (also proposing a new plagiarism detection score). In Forner et al. [6], http://www.clef-initiative.eu/publication/working-notes , ISBN 978-88-904810-3-1

  9. Potthast, M., Gollub, T., Hagen, M., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection. CLEF (Online Working Notes/Labs/Workshop) (2012)

    Google Scholar 

  10. Chong, M., Specia, L., Mitkov, R.: Using Natural Language Processing for Automatic Detection of Plagiarism. In: Proceedings of the 4th International Plagiarism Conference (IPC 2010), Newcastle-upon-Tyne, UK (2010)

    Google Scholar 

  11. Kumar, N., Srinathan, K., Varma, V.: A Knowledge Induced Graph-Theoretical Model for Extract and Abstract Single Document Summarization. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 408–423. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Nahnsen, T., Uzuner, O., Katz, B.: Lexical chains and sliding locality windows in content-based text similarity detection. In: Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP 2005), Jeju Island, Korea, pp. 150-154 (2005)

    Google Scholar 

  13. Zini, M., Fabbri, M., Moneglia, M., Panunzi, A.: Plagiarism Detection through Multilevel Text Comparison. In: 2006 Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution (AXMEDIS 2006), December 2006, pp. 181–185 (2006), doi:10.1109/AXMEDIS.2006.40.

    Google Scholar 

  14. Lancaster, T., Culwin, F.: A Visual Argument for Plagiarism Detection using Word Pairs. In: Proceedings of the 1st International Plagiarism Conference, Newcastle, UK, vol. 4, pp. 1–14 (2004a)

    Google Scholar 

  15. Hoad, T., Zobel, J.: Methods for identifying versioned and plagiarized documents. Journal of the American Society for Information Science and Technology 54(3), 203–215 (2003)

    Article  Google Scholar 

  16. Shivakumar, N., Garcia-Molina, H.: SCAM: A copy detection mechanism for digital documents. In: Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, Texas, USA, pp. 1–13 (1995)

    Google Scholar 

  17. Ceska, Z.: Automatic Plagiarism Detection Based on Latent Semantic Analysis. Doctoral thesis, University of West Bohemia (2009)

    Google Scholar 

  18. Alzahrani, S., Salim, N.: Fuzzy Semantic-Based String Similarity for Lab Report for PAN at CLEF 2010. In: Proceedings of the International Conference of the Cross-Language Evaluation Forum (CLEF 2010), Uncovering Plagiarism, Authorship, and Social Software Misuse Worksop (PAN 2010), Padua,Italy (2010)

    Google Scholar 

  19. Chen, C.-Y., Yeh, J.-Y., Ke, H.-R.: Plagiarism Detection using ROUGE and WordNet. Journal of Computing 2(3), 34–44 (2010)

    Google Scholar 

  20. Kohler, K., Weber-Wul, D.: Plagiarism Detection Test 2010. Technical report, HTW Berlin (2010)

    Google Scholar 

  21. Scott, S., Matwin, S.: Text classification using WordNet hypernyms.In: Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference (1998)

    Google Scholar 

  22. Tang, Liu: Community Detection and Mining in Social Media. Morgan & Claypool Publishers (2010)

    Google Scholar 

  23. Hartrumpf, S., vor der Brück, T., Eichhorn, C.: Semantic duplicate identification with parsing and machine learning. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 84–92. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kumar, N. (2014). A Graph Based Automatic Plagiarism Detection Technique to Handle Artificial Word Reordering and Paraphrasing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics