Skip to main content

A Plagiarism Detection System for Arabic Documents

  • Conference paper
Intelligent Systems'2014

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 323))

Abstract

This paper proposes a new plagiarism detection system devoted to Arabic text documents. This system is based on modeling the relation between documents and their n-gram phrases. Part-of-Speech tagging is applied on the examined documents to support in resolving the morphological ambiguity during text normalization. Text indexing and stop-words removal are performed, employing a new morphological analysis based method. Heuristic pairwise phrase matching algorithm is used to build the documents TF-IDF model, considering substitution of words with their synonyms. The hidden associations of the unique n-gram phrases contained in the documents are investigated using the Latent Semantic Analysis. Then, the pairwise document similarity scores are derived from the Singular Value Decomposition computations. The performance of the proposed system was confirmed through experiments with various data sets, exhibiting promising capabilities in identifying literal and some types of intelligent plagiarism. Finally, the proposed system was compared to Plagiarism-Checker-X, and the proposed system outperformed Plagiarism-Checker-X, especially for intelligent plagiarism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fish, R., Hura, G.: Students’ Perceptions of Plagiarism. Journal of the Scholarship of Teaching and Learning 13(5), 33–45 (2013)

    Google Scholar 

  2. Alzahrani, S., Salim, N., Abraham, A.: Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 42(2), 133–149 (2012)

    Article  Google Scholar 

  3. Meuschke, N., Gipp, B.: State-of-the-art in Detecting Academic Plagiarism. International Journal for Educational Integrity 9(1), 50–71 (2013)

    Google Scholar 

  4. Riad, A.M., Farahat, F.F., Asem, A.S., Zaher, M.A.: Studying Different Methods for Plagiarism Detection. International Journal of Computer Science 2(5), 147–154 (2013)

    Google Scholar 

  5. Alzahrani, S.M., Salim, N.: Plagiarism Detection In Arabic Scripts Using Fuzzy Information Retrieval. In: 2008 Student Conference on Research and Development (SCOReD 2008), Johor, Malaysia, vol. 281, pp. 1–4 (2008)

    Google Scholar 

  6. Alzahrani, S.M., Salim, N.: Statement-Based Fuzzy-Set IR versus Fingerprints Matching for Plagiarism Detection in Arabic Documents. In: 5th Postgraduate Annual Research Seminar (PARS 2009), Johor Bahru, Malaysia, pp. 267–268 (2009)

    Google Scholar 

  7. Menai, M., Bagais, M.: APlag: A Plagiarism Checker for Arabic Texts. In: IEEE International Conference on Computer Science & Education (ICCSE 2011), SuperStar Virgo, Singapore, pp. 1379–1383 (2011)

    Google Scholar 

  8. Menai, M.E.: Detection of Plagiarism in Arabic Documents. International Journal of Information Technology and Computer Science 10, 80–89 (2012)

    Article  Google Scholar 

  9. Jadalla, A., Elnagar, A.: A Plagiarism Detection System for Arabic Text-Based Documents. In: Chau, M., Wang, G.A., Yue, W.T., Chen, H. (eds.) PAISI 2012. LNCS, vol. 7299, pp. 145–153. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Mozgovoy, M., Kakkonen, T., Cosma, G.: Automatic Student Plagiarism Detection: Future Perspectives. Journal of Educational Computing Research 43(4), 511–531 (2010)

    Article  Google Scholar 

  11. Deerwester, S., Dumais, S., Furnas, G., Harshman, R., Landauer, T., Lochbaum, K., et al.: Patent No. US Patent 4,839,853, USA (1988)

    Google Scholar 

  12. Simmons, S., Estes, Z.: Using Latent Semantic Analysis to Estimate Similarity. In: 28th Annual Conference of the Cognitive Science Society (CogSci 2006), Vancouver, British Columbia, Canada, pp. 2169–2173 (2006)

    Google Scholar 

  13. Ceska, Z.: Plagiarism Detection Based on Singular Value Decomposition. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 108–119. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Ceska, Z., Toman, M., Jezek, K.: Multilingual Plagiarism Detection. In: Dochev, D., Pistore, M., Traverso, P. (eds.) AIMSA 2008. LNCS (LNAI), vol. 5253, pp. 83–92. Springer, Heidelberg (2008)

    Google Scholar 

  15. Ceska, Z., Fox, C.: The Influence of Text Pre-processing on Plagiarism Detection. In: Recent Advances in Natural Language Processing, RANLP 2009, Borovets, Bulgaria, pp. 55–59 (2009)

    Google Scholar 

  16. Hussein, A., El-Shishiny, H.: A Framework and a Prototype for Arabic Question Answering System. Egyptian Informatics Journal 4(1), 26–39 (2003)

    Google Scholar 

  17. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  18. El-Khair, I.A.: Effects of Stop Words Elimination for Arabic Information Retrival: A Comparative Study. International Journal of Computing and Information Sciences 4(3), 119–133 (2006)

    Google Scholar 

  19. Alajmi, A., Saad, E.M., Darwish, R.R.: Toward an ARABIC Stop-Words List Generation. International Journal of Computer Applications 46(8), 8–13 (2012)

    Google Scholar 

  20. Ceska, Z., Hanak, I., Tesar, R.: Teraman: A Tool for N-gram Extraction from Large Datasets. In: the 2007 IEEE International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, pp. 209–216 (2007)

    Google Scholar 

  21. Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Retrieval. Journal of Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashraf S. Hussein .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Hussein, A.S. (2015). A Plagiarism Detection System for Arabic Documents. In: Filev, D., et al. Intelligent Systems'2014. Advances in Intelligent Systems and Computing, vol 323. Springer, Cham. https://doi.org/10.1007/978-3-319-11310-4_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11310-4_47

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11309-8

  • Online ISBN: 978-3-319-11310-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics