Abstract
Most of the plagiarism detection techniques are based on either string based matching or semantic matching of adjacent strings. However, due to the use of artificial word re-ordering and paraphrasing, the detection of plagiarism has become a challenging task of significant interest. To solve this issue, we concentrate on identification of overlapping adjacent plagiarized word patterns and overlapping non-adjacent/reordered plagiarized word patterns from target document(s). Here the main aim is to capture the simple cases and the complex cases (i.e., artificial word reordering and/or paraphrasing) of plagiarism in the target document. For this first of all we identify the relation between all overlapping word pairs with the help of controlled closeness centrality and semantic similarity. Next, to extract the plagiarized word patterns, we introduce the use of minimum weighted bipartite clique covers. We use the plagiarized word patterns in the identification of plagiarized texts from the target document. Our experimental results on publicly available and annotated dataset like: ‘PAN 2012 plagiarism detection dataset’ and ‘Student answer related plagiarism dataset’ shows that it performs better than state-of-arts systems in this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barron-Cedeno, A., Vila, M., Martí, M.A., Rosso, P.: Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection. Computational Linguistics. MIT Press (2013), doi: 10.1162/COLI_a_00153.
IEEE. 2008. A Plagiarism FAQ, http://www.ieee.org/publicationsstandards/publications/rights/plagiarismFAQ.html (last accessed November 25, 2012)
Bhagat, R., Hovy, E.: What Is a Paraphrase? Computational Linguistics. MIT Press (2013), doi:10.1162/COLI_a_00166.
Bouma, G.: Normalized (Pointwise) Mutual Information in Collocation Extraction. In: Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology, pp. 31–40 (2009)
Clough, P., Stevenson, M.: Developing a corpus of plagiarised short answers. In: Language Resources and Evaluation, LREC 2010, vol. 2010 (2009)
Kong, L., Qi, H., Wang, S., Du, C., Wang, S., Han, Y.: Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection—Notebook for PAN at CLEF 2012, http://www.clef-initiative.eu/publication/working-notes , ISBN 978-88-904810-3-1
Suchomel, Š., Kasprzak, J., Brandejs, M.: Three Way Search Engine Queries with Multi-feature Document Comparison for Plagiarism Detection—Notebook for PAN at CLEF (2012). In: Forner et al [6], http://www.clef-initiative.eu/publication/working-notes , ISBN 978-88-904810-3-1
Grozea, C., Popescu, M.: Encoplot - Tuned for High Recall (also proposing a new plagiarism detection score). In Forner et al. [6], http://www.clef-initiative.eu/publication/working-notes , ISBN 978-88-904810-3-1
Potthast, M., Gollub, T., Hagen, M., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection. CLEF (Online Working Notes/Labs/Workshop) (2012)
Chong, M., Specia, L., Mitkov, R.: Using Natural Language Processing for Automatic Detection of Plagiarism. In: Proceedings of the 4th International Plagiarism Conference (IPC 2010), Newcastle-upon-Tyne, UK (2010)
Kumar, N., Srinathan, K., Varma, V.: A Knowledge Induced Graph-Theoretical Model for Extract and Abstract Single Document Summarization. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 408–423. Springer, Heidelberg (2013)
Nahnsen, T., Uzuner, O., Katz, B.: Lexical chains and sliding locality windows in content-based text similarity detection. In: Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP 2005), Jeju Island, Korea, pp. 150-154 (2005)
Zini, M., Fabbri, M., Moneglia, M., Panunzi, A.: Plagiarism Detection through Multilevel Text Comparison. In: 2006 Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution (AXMEDIS 2006), December 2006, pp. 181–185 (2006), doi:10.1109/AXMEDIS.2006.40.
Lancaster, T., Culwin, F.: A Visual Argument for Plagiarism Detection using Word Pairs. In: Proceedings of the 1st International Plagiarism Conference, Newcastle, UK, vol. 4, pp. 1–14 (2004a)
Hoad, T., Zobel, J.: Methods for identifying versioned and plagiarized documents. Journal of the American Society for Information Science and Technology 54(3), 203–215 (2003)
Shivakumar, N., Garcia-Molina, H.: SCAM: A copy detection mechanism for digital documents. In: Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, Texas, USA, pp. 1–13 (1995)
Ceska, Z.: Automatic Plagiarism Detection Based on Latent Semantic Analysis. Doctoral thesis, University of West Bohemia (2009)
Alzahrani, S., Salim, N.: Fuzzy Semantic-Based String Similarity for Lab Report for PAN at CLEF 2010. In: Proceedings of the International Conference of the Cross-Language Evaluation Forum (CLEF 2010), Uncovering Plagiarism, Authorship, and Social Software Misuse Worksop (PAN 2010), Padua,Italy (2010)
Chen, C.-Y., Yeh, J.-Y., Ke, H.-R.: Plagiarism Detection using ROUGE and WordNet. Journal of Computing 2(3), 34–44 (2010)
Kohler, K., Weber-Wul, D.: Plagiarism Detection Test 2010. Technical report, HTW Berlin (2010)
Scott, S., Matwin, S.: Text classification using WordNet hypernyms.In: Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference (1998)
Tang, Liu: Community Detection and Mining in Social Media. Morgan & Claypool Publishers (2010)
Hartrumpf, S., vor der Brück, T., Eichhorn, C.: Semantic duplicate identification with parsing and machine learning. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 84–92. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, N. (2014). A Graph Based Automatic Plagiarism Detection Technique to Handle Artificial Word Reordering and Paraphrasing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)