A Plagiarism Detection System for Arabic Text-Based Documents

Jadalla, Ameera; Elnagar, Ashraf

doi:10.1007/978-3-642-30428-6_12

Ameera Jadalla²⁰ &
Ashraf Elnagar²⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7299))

Included in the following conference series:

Pacific-Asia Workshop on Intelligence and Security Informatics

1482 Accesses
13 Citations

Abstract

This paper presents a novel plagiarism detection system for Arabic text-based documents, Iqtebas 1.0. This is a primary work dedicated for plagiarism of Arabic based documents. Arabic is a rich morphological language that is among the top used languages in the world and in the Internet as well. Given a document and a set of suspected files, our goal is to compute the originality value of the examined document. The originality value of a text is computed by computing the distance between each sentence in the text and the closest sentence in the suspected files, if exists. The proposed system structure is based on a search engine in order to reduce the cost of pairwise similarity. For the indexing process, we use the winnowing n-gram fingerprinting algorithm to reduce the index size. The fingerprints of each sentence are its n-grams that are represented by hash codes. The winnowing algorithm computes fingerprints for each sentence. As a result, the search time is improved and the detection process is accurate and robust. The experimental results showed superb performance of Iqtebas 1.0 as it achieved a recall value of 94% and a precision of 99%.Moreover, a comparison that is carried out between Iqtebas and the well known plagiarism detection system, SafeAssign, confirmed the high performance of Iqtebas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Attia, M.A.: Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. The University of Manchester (2008)
Google Scholar
Hoad, T.C., Zobel, J.: Methods for identifying versioned and plagiarized documents. Journal of the American Society for Information Science and Technology 54(3), 203–215 (2003)
Article Google Scholar
Khoja, S., Garside, R.: Arabic Text. Computing Department, Lancaster University, Lancaster, UK (1999), http://www.comp.lancs.ac.uk/computing/users/khojatemmer.ps
Google Scholar
Lancaster, T., Culwin, F.: Preserving academic integrity-fighting against nonoriginality agencies. British Journal of Educational Technology 38(1), 153–157 (2007)
Article Google Scholar
Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11(1-2), 22–31 (1968)
Google Scholar
Maurer, H.A., Kappe, F., Zaka, B.: Plagiarism - A Survey. J. UCS 12(8), 1050–1084 (2006)
Google Scholar
McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a twostage discriminative parser. In: Proc. of the Tenth Conference on Computational Natural Language Learning (2006)
Google Scholar
Melink, S., Raghavan, S., Yang, B., Garcia-Molina, H.: Building a distributed full-text index for the web. ACM Transactions on Information Systems (TOIS) 19(3), 241 (2001)
Google Scholar
Ottenstein, K.J.: An algorithmic approach to the detection and prevention of plagiarism. ACM Sigcse Bulletin 8(4), 30–41 (1976)
Article Google Scholar
Schleimer, S., Wilkerson, D.S., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 76–85 (2003)
Google Scholar
Seo, J., Croft, W.B.: Local text reuse detection. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieva, pp. 571–578 (2008)
Google Scholar
Shivakumar, N., Garcia-Molina, H.: SCAM: A copy detection mechanism for digital documents. In: Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries (1995)
Google Scholar
Taghva, K., Elkhoury, R., Coombs, J.: Arabic stemming without a root dictionary. In: International Conference on Information Technology: Coding and Computing, pp. 152–157 (2005)
Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to data mining. Pearson Addison Wesley, Boston (2006)
Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1), 69–90 (1999)
Article Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys (CSUR) 38(2) (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Sharjah, P.O. Box 27272, Sharjah, UAE
Ameera Jadalla & Ashraf Elnagar

Authors

Ameera Jadalla
View author publications
You can also search for this author in PubMed Google Scholar
Ashraf Elnagar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of Hong Kong, 7/F Meng Wah Complex, Pokfulam, Hong Kong
Michael Chau
Virginia Tech, Blacksburg, VA, USA
G. Alan Wang
City University of Hong Kong, Hong Kong
Wei Thoo Yue
Artificial Intelligence Lab, University of Arizona, Tucson, Arizona, USA
Hsinchun Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jadalla, A., Elnagar, A. (2012). A Plagiarism Detection System for Arabic Text-Based Documents. In: Chau, M., Wang, G.A., Yue, W.T., Chen, H. (eds) Intelligence and Security Informatics. PAISI 2012. Lecture Notes in Computer Science, vol 7299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30428-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-30428-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30427-9
Online ISBN: 978-3-642-30428-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics