A Plagiarism Detection System for Arabic Documents

Hussein, Ashraf S.

doi:10.1007/978-3-319-11310-4_47

Ashraf S. Hussein¹²

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 323))

4071 Accesses
8 Citations
1 Altmetric

Abstract

This paper proposes a new plagiarism detection system devoted to Arabic text documents. This system is based on modeling the relation between documents and their n-gram phrases. Part-of-Speech tagging is applied on the examined documents to support in resolving the morphological ambiguity during text normalization. Text indexing and stop-words removal are performed, employing a new morphological analysis based method. Heuristic pairwise phrase matching algorithm is used to build the documents TF-IDF model, considering substitution of words with their synonyms. The hidden associations of the unique n-gram phrases contained in the documents are investigated using the Latent Semantic Analysis. Then, the pairwise document similarity scores are derived from the Singular Value Decomposition computations. The performance of the proposed system was confirmed through experiments with various data sets, exhibiting promising capabilities in identifying literal and some types of intelligent plagiarism. Finally, the proposed system was compared to Plagiarism-Checker-X, and the proposed system outperformed Plagiarism-Checker-X, especially for intelligent plagiarism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fish, R., Hura, G.: Students’ Perceptions of Plagiarism. Journal of the Scholarship of Teaching and Learning 13(5), 33–45 (2013)
Google Scholar
Alzahrani, S., Salim, N., Abraham, A.: Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 42(2), 133–149 (2012)
Article Google Scholar
Meuschke, N., Gipp, B.: State-of-the-art in Detecting Academic Plagiarism. International Journal for Educational Integrity 9(1), 50–71 (2013)
Google Scholar
Riad, A.M., Farahat, F.F., Asem, A.S., Zaher, M.A.: Studying Different Methods for Plagiarism Detection. International Journal of Computer Science 2(5), 147–154 (2013)
Google Scholar
Alzahrani, S.M., Salim, N.: Plagiarism Detection In Arabic Scripts Using Fuzzy Information Retrieval. In: 2008 Student Conference on Research and Development (SCOReD 2008), Johor, Malaysia, vol. 281, pp. 1–4 (2008)
Google Scholar
Alzahrani, S.M., Salim, N.: Statement-Based Fuzzy-Set IR versus Fingerprints Matching for Plagiarism Detection in Arabic Documents. In: 5th Postgraduate Annual Research Seminar (PARS 2009), Johor Bahru, Malaysia, pp. 267–268 (2009)
Google Scholar
Menai, M., Bagais, M.: APlag: A Plagiarism Checker for Arabic Texts. In: IEEE International Conference on Computer Science & Education (ICCSE 2011), SuperStar Virgo, Singapore, pp. 1379–1383 (2011)
Google Scholar
Menai, M.E.: Detection of Plagiarism in Arabic Documents. International Journal of Information Technology and Computer Science 10, 80–89 (2012)
Article Google Scholar
Jadalla, A., Elnagar, A.: A Plagiarism Detection System for Arabic Text-Based Documents. In: Chau, M., Wang, G.A., Yue, W.T., Chen, H. (eds.) PAISI 2012. LNCS, vol. 7299, pp. 145–153. Springer, Heidelberg (2012)
Chapter Google Scholar
Mozgovoy, M., Kakkonen, T., Cosma, G.: Automatic Student Plagiarism Detection: Future Perspectives. Journal of Educational Computing Research 43(4), 511–531 (2010)
Article Google Scholar
Deerwester, S., Dumais, S., Furnas, G., Harshman, R., Landauer, T., Lochbaum, K., et al.: Patent No. US Patent 4,839,853, USA (1988)
Google Scholar
Simmons, S., Estes, Z.: Using Latent Semantic Analysis to Estimate Similarity. In: 28th Annual Conference of the Cognitive Science Society (CogSci 2006), Vancouver, British Columbia, Canada, pp. 2169–2173 (2006)
Google Scholar
Ceska, Z.: Plagiarism Detection Based on Singular Value Decomposition. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 108–119. Springer, Heidelberg (2008)
Chapter Google Scholar
Ceska, Z., Toman, M., Jezek, K.: Multilingual Plagiarism Detection. In: Dochev, D., Pistore, M., Traverso, P. (eds.) AIMSA 2008. LNCS (LNAI), vol. 5253, pp. 83–92. Springer, Heidelberg (2008)
Google Scholar
Ceska, Z., Fox, C.: The Influence of Text Pre-processing on Plagiarism Detection. In: Recent Advances in Natural Language Processing, RANLP 2009, Borovets, Bulgaria, pp. 55–59 (2009)
Google Scholar
Hussein, A., El-Shishiny, H.: A Framework and a Prototype for Arabic Question Answering System. Egyptian Informatics Journal 4(1), 26–39 (2003)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
El-Khair, I.A.: Effects of Stop Words Elimination for Arabic Information Retrival: A Comparative Study. International Journal of Computing and Information Sciences 4(3), 119–133 (2006)
Google Scholar
Alajmi, A., Saad, E.M., Darwish, R.R.: Toward an ARABIC Stop-Words List Generation. International Journal of Computer Applications 46(8), 8–13 (2012)
Google Scholar
Ceska, Z., Hanak, I., Tesar, R.: Teraman: A Tool for N-gram Extraction from Large Datasets. In: the 2007 IEEE International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, pp. 209–216 (2007)
Google Scholar
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Retrieval. Journal of Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Sciences, Ain Shams University, Cairo, 11566, Egypt
Ashraf S. Hussein

Authors

Ashraf S. Hussein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashraf S. Hussein .

Editor information

Editors and Affiliations

Ford Motor Company, Research & Advanced Engineering, Dearborn, Mississippi, USA
D. Filev
Industrial Research Institute for Automation and Measurements (PIAP), Warsaw, Poland
J. Jabłkowski
Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland
J. Kacprzyk
Polish Academy of Sciences and WIT - Warsaw School of Information Technology, Systems Research Institute, Warsaw, Poland
M. Krawczak
Bulgarian Academy of Sciences, Institute of Information and Communication, Sofia, Bulgaria
I. Popchev
Department of Computer Engineering, Częstochowa University of Technology, Częstochowa, Poland
L. Rutkowski
Bulgarian Academy of Sciences, Institute of Information and Communication Technologies, Sofia, Bulgaria
V. Sgurev
Department of Computer and Information Technologies, “Prof. Assen Zlatarov" University Faculty of Technical Sciences, Bourgas, Bulgaria
E. Sotirova
Industrial Research Institute for Automation and Measurements (PIAP), Warsaw, Poland
P. Szynkarczyk
Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland
S. Zadrozny

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hussein, A.S. (2015). A Plagiarism Detection System for Arabic Documents. In: Filev, D., et al. Intelligent Systems'2014. Advances in Intelligent Systems and Computing, vol 323. Springer, Cham. https://doi.org/10.1007/978-3-319-11310-4_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-11310-4_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11309-8
Online ISBN: 978-3-319-11310-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics