Skip to main content

HYPLAG: Hybrid Arabic Text Plagiarism Detection System

  • Conference paper
  • First Online:
Book cover Natural Language Processing and Information Systems (NLDB 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10859))

Abstract

Plagiarism is specifically defined as literary theft of paragraphs or sentences from unreferenced source. This unauthorized behavior is a real problem that targets scientific research scope. This paper proposes a Hybrid Arabic Plagiarism Detection System (HYPLAG). The HYPLAG approach combines corpus-based and knowledge-based approaches by utilizing an Arabic semantic resource (Arabic WordNet). A preliminary study on texts from undergraduate students was conducted to understand their behavior and the patterns used in plagiarism. The results of the study show that students apply different techniques to plagiarized sentences, also it shows changes in sentence’s components (verbs, nouns, and adjectives). HYPLAG was evaluated on the ExAraPlagDet-2015 dataset against several other approaches that participated in the AraPlagDet PAN@FIRE shared task on Extrinsic Arabic plagiarism detection obtaining a higher performance (F-score 89% vs. 84% obtained by the best performing system at AraPlagDet) with less computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.plagiarism.org/resources/facts-and-stats/; accessed on October 2016.

  2. 2.

    http://www.checkforplagiarism.net/cyber-plagiarism; accessed on October 2016.

  3. 3.

    https://infogr.am/Plagiarism-606324; accessed on October 2016.

  4. 4.

    We used Farasa NER tool for named entity recognition, http://qatsdemo.cloudapp.net/farasa/; accessed on December 2016.

  5. 5.

    http://globalwordnet.org/arabic-wordnet/; accessed on November 2016.

  6. 6.

    http://misc-umc.org/AraPlagDet/?i=1; accessed on September 2016.

References

  1. Magooda, A., Mahgoub, A.Y., Rashwan, M., Fayek, M.B., Raafat, H.: RDI system for extrinsic plagiarism detection (RDI_RED), working notes for PANAraPlagDet at FIRE 2015. In: FIRE Workshops, pp. 126–128 (2015)

    Google Scholar 

  2. Khan, I.H., Siddiqui, M.A., Mansoor, K.: A framework for plagiarism detection in Arabic documents (2015)

    Google Scholar 

  3. Jadalla, A., Elnagar, A.: A plagiarism detection system for Arabic text-based documents. In: Chau, M., Wang, G.A., Yue, W.T., Chen, H. (eds.) PAISI 2012. LNCS, vol. 7299, pp. 145–153. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30428-6_12

    Chapter  Google Scholar 

  4. Farahat, F.F., Asem, A.S., Zaher, M.A., Fahiem, A.M.: Detecting plagiarism in Arabic E-Learning using text mining. Br. J. Math. Comput. Sci. 8(4), 298–308 (2015)

    Article  Google Scholar 

  5. Hussein, A.S.: A plagiarism detection system for Arabic documents. In: Filev, D., Jabłkowski, J., Kacprzyk, J., Krawczak, M., Popchev, I., Rutkowski, L., Sgurev, V., Sotirova, E., Szynkarczyk, P., Zadrozny, S. (eds.) Intelligent Systems’2014. AISC, vol. 323, pp. 541–552. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-11310-4_47

    Chapter  Google Scholar 

  6. Yousef, A.A., Aziz, M.J.: Enhanced Tf-Idf weighting scheme for plagiarism detection model for Arabic language. Aust. J. Basic Appl. Sci. 9(23), 90–96 (2015)

    Google Scholar 

  7. Alzahrani, S.: Arabic plagiarism detection using word correlation in N-Grams with K-overlapping approach, working notes for PAN-AraPlagDet at FIRE 2015. In: FIRE Workshops (2015)

    Google Scholar 

  8. Alzahrani, S., Salim, N.: Statement-based fuzzy-set information retrieval versus fingerprints matching for plagiarism detection in Arabic documents. In: 5th Postgraduate Annual Research Seminar (PARS 2009), pp. 267–268 (2009)

    Google Scholar 

  9. Menai, M.E.B.: Detection of plagiarism in Arabic documents. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 4(10), 80 (2012)

    Google Scholar 

  10. Saad, M.K., Ashour, W.: Arabic morphological tools for text mining. In: 6th ArchEng International Symposiums, EEECS 2010, The 6th International Symposium on Electrical and Electronics Engineering and Computer Science, p. 19. European University of Lefke, Cyprus (2010)

    Google Scholar 

  11. Zhang, Y., Li, C., Barzilay, R., Darwish, K.: Randomized greedy inference for joint segmentation, POS tagging and dependency parsing. In: HLT-NAACL, pp. 42–52 (2015)

    Google Scholar 

  12. Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)

    Article  Google Scholar 

  13. Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in WordNet. Int. J. Hybrid Inf. Technol. 6(1), 1–12 (2013)

    Google Scholar 

  14. Tversky, A.: Features of similarity. Psychol. Rev. 84(4), 327 (1977)

    Article  Google Scholar 

  15. Pirró, G., Euzenat, J.: A feature and information theoretic framework for semantic similarity and relatedness. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 615–630. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17746-0_39

    Chapter  Google Scholar 

  16. Bensalem, I., Boukhalfa, I., Rosso, P., Abouenour, L., Darwish, K., Chikhi, S.: Overview of the AraPlagDet PAN@ FIRE2015 shared task on Arabic plagiarism detection. In: FIRE Workshops, pp. 111–122 (2015)

    Google Scholar 

Download references

Acknowledgment

The work of Paolo Rosso was funded by the SomEMBED TIN2015-71147-C2-1-P MINECO research project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bilal Ghanem .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ghanem, B., Arafeh, L., Rosso, P., Sánchez-Vega, F. (2018). HYPLAG: Hybrid Arabic Text Plagiarism Detection System. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91947-8_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91946-1

  • Online ISBN: 978-3-319-91947-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics