Abstract
This paper proposes a new plagiarism detection system devoted to Arabic text documents. This system is based on modeling the relation between documents and their n-gram phrases. Part-of-Speech tagging is applied on the examined documents to support in resolving the morphological ambiguity during text normalization. Text indexing and stop-words removal are performed, employing a new morphological analysis based method. Heuristic pairwise phrase matching algorithm is used to build the documents TF-IDF model, considering substitution of words with their synonyms. The hidden associations of the unique n-gram phrases contained in the documents are investigated using the Latent Semantic Analysis. Then, the pairwise document similarity scores are derived from the Singular Value Decomposition computations. The performance of the proposed system was confirmed through experiments with various data sets, exhibiting promising capabilities in identifying literal and some types of intelligent plagiarism. Finally, the proposed system was compared to Plagiarism-Checker-X, and the proposed system outperformed Plagiarism-Checker-X, especially for intelligent plagiarism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fish, R., Hura, G.: Students’ Perceptions of Plagiarism. Journal of the Scholarship of Teaching and Learning 13(5), 33–45 (2013)
Alzahrani, S., Salim, N., Abraham, A.: Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 42(2), 133–149 (2012)
Meuschke, N., Gipp, B.: State-of-the-art in Detecting Academic Plagiarism. International Journal for Educational Integrity 9(1), 50–71 (2013)
Riad, A.M., Farahat, F.F., Asem, A.S., Zaher, M.A.: Studying Different Methods for Plagiarism Detection. International Journal of Computer Science 2(5), 147–154 (2013)
Alzahrani, S.M., Salim, N.: Plagiarism Detection In Arabic Scripts Using Fuzzy Information Retrieval. In: 2008 Student Conference on Research and Development (SCOReD 2008), Johor, Malaysia, vol. 281, pp. 1–4 (2008)
Alzahrani, S.M., Salim, N.: Statement-Based Fuzzy-Set IR versus Fingerprints Matching for Plagiarism Detection in Arabic Documents. In: 5th Postgraduate Annual Research Seminar (PARS 2009), Johor Bahru, Malaysia, pp. 267–268 (2009)
Menai, M., Bagais, M.: APlag: A Plagiarism Checker for Arabic Texts. In: IEEE International Conference on Computer Science & Education (ICCSE 2011), SuperStar Virgo, Singapore, pp. 1379–1383 (2011)
Menai, M.E.: Detection of Plagiarism in Arabic Documents. International Journal of Information Technology and Computer Science 10, 80–89 (2012)
Jadalla, A., Elnagar, A.: A Plagiarism Detection System for Arabic Text-Based Documents. In: Chau, M., Wang, G.A., Yue, W.T., Chen, H. (eds.) PAISI 2012. LNCS, vol. 7299, pp. 145–153. Springer, Heidelberg (2012)
Mozgovoy, M., Kakkonen, T., Cosma, G.: Automatic Student Plagiarism Detection: Future Perspectives. Journal of Educational Computing Research 43(4), 511–531 (2010)
Deerwester, S., Dumais, S., Furnas, G., Harshman, R., Landauer, T., Lochbaum, K., et al.: Patent No. US Patent 4,839,853, USA (1988)
Simmons, S., Estes, Z.: Using Latent Semantic Analysis to Estimate Similarity. In: 28th Annual Conference of the Cognitive Science Society (CogSci 2006), Vancouver, British Columbia, Canada, pp. 2169–2173 (2006)
Ceska, Z.: Plagiarism Detection Based on Singular Value Decomposition. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 108–119. Springer, Heidelberg (2008)
Ceska, Z., Toman, M., Jezek, K.: Multilingual Plagiarism Detection. In: Dochev, D., Pistore, M., Traverso, P. (eds.) AIMSA 2008. LNCS (LNAI), vol. 5253, pp. 83–92. Springer, Heidelberg (2008)
Ceska, Z., Fox, C.: The Influence of Text Pre-processing on Plagiarism Detection. In: Recent Advances in Natural Language Processing, RANLP 2009, Borovets, Bulgaria, pp. 55–59 (2009)
Hussein, A., El-Shishiny, H.: A Framework and a Prototype for Arabic Question Answering System. Egyptian Informatics Journal 4(1), 26–39 (2003)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
El-Khair, I.A.: Effects of Stop Words Elimination for Arabic Information Retrival: A Comparative Study. International Journal of Computing and Information Sciences 4(3), 119–133 (2006)
Alajmi, A., Saad, E.M., Darwish, R.R.: Toward an ARABIC Stop-Words List Generation. International Journal of Computer Applications 46(8), 8–13 (2012)
Ceska, Z., Hanak, I., Tesar, R.: Teraman: A Tool for N-gram Extraction from Large Datasets. In: the 2007 IEEE International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, pp. 209–216 (2007)
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Retrieval. Journal of Information Processing and Management 24(5), 513–523 (1988)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hussein, A.S. (2015). A Plagiarism Detection System for Arabic Documents. In: Filev, D., et al. Intelligent Systems'2014. Advances in Intelligent Systems and Computing, vol 323. Springer, Cham. https://doi.org/10.1007/978-3-319-11310-4_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-11310-4_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11309-8
Online ISBN: 978-3-319-11310-4
eBook Packages: EngineeringEngineering (R0)