A Graph Based Automatic Plagiarism Detection Technique to Handle Artificial Word Reordering and Paraphrasing

Kumar, Niraj

doi:10.1007/978-3-642-54903-8_40

Niraj Kumar¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1750 Accesses
4 Citations

Abstract

Most of the plagiarism detection techniques are based on either string based matching or semantic matching of adjacent strings. However, due to the use of artificial word re-ordering and paraphrasing, the detection of plagiarism has become a challenging task of significant interest. To solve this issue, we concentrate on identification of overlapping adjacent plagiarized word patterns and overlapping non-adjacent/reordered plagiarized word patterns from target document(s). Here the main aim is to capture the simple cases and the complex cases (i.e., artificial word reordering and/or paraphrasing) of plagiarism in the target document. For this first of all we identify the relation between all overlapping word pairs with the help of controlled closeness centrality and semantic similarity. Next, to extract the plagiarized word patterns, we introduce the use of minimum weighted bipartite clique covers. We use the plagiarized word patterns in the identification of plagiarized texts from the target document. Our experimental results on publicly available and annotated dataset like: ‘PAN 2012 plagiarism detection dataset’ and ‘Student answer related plagiarism dataset’ shows that it performs better than state-of-arts systems in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barron-Cedeno, A., Vila, M., Martí, M.A., Rosso, P.: Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection. Computational Linguistics. MIT Press (2013), doi: 10.1162/COLI_a_00153.
Google Scholar
IEEE. 2008. A Plagiarism FAQ, http://www.ieee.org/publicationsstandards/publications/rights/plagiarismFAQ.html (last accessed November 25, 2012)
Bhagat, R., Hovy, E.: What Is a Paraphrase? Computational Linguistics. MIT Press (2013), doi:10.1162/COLI_a_00166.
Google Scholar
Bouma, G.: Normalized (Pointwise) Mutual Information in Collocation Extraction. In: Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology, pp. 31–40 (2009)
Google Scholar
Clough, P., Stevenson, M.: Developing a corpus of plagiarised short answers. In: Language Resources and Evaluation, LREC 2010, vol. 2010 (2009)
Google Scholar
Kong, L., Qi, H., Wang, S., Du, C., Wang, S., Han, Y.: Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection—Notebook for PAN at CLEF 2012, http://www.clef-initiative.eu/publication/working-notes , ISBN 978-88-904810-3-1
Suchomel, Š., Kasprzak, J., Brandejs, M.: Three Way Search Engine Queries with Multi-feature Document Comparison for Plagiarism Detection—Notebook for PAN at CLEF (2012). In: Forner et al [6], http://www.clef-initiative.eu/publication/working-notes , ISBN 978-88-904810-3-1
Grozea, C., Popescu, M.: Encoplot - Tuned for High Recall (also proposing a new plagiarism detection score). In Forner et al. [6], http://www.clef-initiative.eu/publication/working-notes , ISBN 978-88-904810-3-1
Potthast, M., Gollub, T., Hagen, M., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection. CLEF (Online Working Notes/Labs/Workshop) (2012)
Google Scholar
Chong, M., Specia, L., Mitkov, R.: Using Natural Language Processing for Automatic Detection of Plagiarism. In: Proceedings of the 4th International Plagiarism Conference (IPC 2010), Newcastle-upon-Tyne, UK (2010)
Google Scholar
Kumar, N., Srinathan, K., Varma, V.: A Knowledge Induced Graph-Theoretical Model for Extract and Abstract Single Document Summarization. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 408–423. Springer, Heidelberg (2013)
Chapter Google Scholar
Nahnsen, T., Uzuner, O., Katz, B.: Lexical chains and sliding locality windows in content-based text similarity detection. In: Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP 2005), Jeju Island, Korea, pp. 150-154 (2005)
Google Scholar
Zini, M., Fabbri, M., Moneglia, M., Panunzi, A.: Plagiarism Detection through Multilevel Text Comparison. In: 2006 Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution (AXMEDIS 2006), December 2006, pp. 181–185 (2006), doi:10.1109/AXMEDIS.2006.40.
Google Scholar
Lancaster, T., Culwin, F.: A Visual Argument for Plagiarism Detection using Word Pairs. In: Proceedings of the 1st International Plagiarism Conference, Newcastle, UK, vol. 4, pp. 1–14 (2004a)
Google Scholar
Hoad, T., Zobel, J.: Methods for identifying versioned and plagiarized documents. Journal of the American Society for Information Science and Technology 54(3), 203–215 (2003)
Article Google Scholar
Shivakumar, N., Garcia-Molina, H.: SCAM: A copy detection mechanism for digital documents. In: Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, Texas, USA, pp. 1–13 (1995)
Google Scholar
Ceska, Z.: Automatic Plagiarism Detection Based on Latent Semantic Analysis. Doctoral thesis, University of West Bohemia (2009)
Google Scholar
Alzahrani, S., Salim, N.: Fuzzy Semantic-Based String Similarity for Lab Report for PAN at CLEF 2010. In: Proceedings of the International Conference of the Cross-Language Evaluation Forum (CLEF 2010), Uncovering Plagiarism, Authorship, and Social Software Misuse Worksop (PAN 2010), Padua,Italy (2010)
Google Scholar
Chen, C.-Y., Yeh, J.-Y., Ke, H.-R.: Plagiarism Detection using ROUGE and WordNet. Journal of Computing 2(3), 34–44 (2010)
Google Scholar
Kohler, K., Weber-Wul, D.: Plagiarism Detection Test 2010. Technical report, HTW Berlin (2010)
Google Scholar
Scott, S., Matwin, S.: Text classification using WordNet hypernyms.In: Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference (1998)
Google Scholar
Tang, Liu: Community Detection and Mining in Social Media. Morgan & Claypool Publishers (2010)
Google Scholar
Hartrumpf, S., vor der Brück, T., Eichhorn, C.: Semantic duplicate identification with parsing and machine learning. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 84–92. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

TCS Innovation Lab, Tata Consultancy Services, New Delhi, India
Niraj Kumar

Authors

Niraj Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan Dios Bátiz, Col. Nueva Industrial Vallejo, 07738, Mexico D.F, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, N. (2014). A Graph Based Automatic Plagiarism Detection Technique to Handle Artificial Word Reordering and Paraphrasing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-54903-8_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics