skip to main content
10.1145/3177148.3180089acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmedpraiConference Proceedingsconference-collections
research-article

A Study of Graph Based Stemmer in Arabic Extrinsic Plagiarism Detection

Published: 27 March 2018 Publication History

Abstract

Arabic stemming as a technique of Natural Language Processing is increasingly becoming a significant research domain since Arabic is one of the most challengeing laguages. In this study, a new graph based-approach for stemming in Arabic documents was proposed. Moreover, an evaluation the impact of this stemmer on extrinsic plagiarism detection was elaborated. In this approach, a word is represented by a directed weighted graph having a set of connected components. Each of these components has a specific representation. Then, a stem is selected by comparing the word's representation with a database of 450 stems. This stemmer showed efficiency by improving the detection process of extrinsic plagiarism which is proved by the results obtained.

References

[1]
MA Al-Atram. 1990. Effectiveness of natural language in indexing and retrieving arabic documents {in Arabic}(King Abdulaziz City for Science and Technology Project number AR-8-47). Riyadh, Saudi Arabia (1990).
[2]
MA Al-Khuli. 1982. A dictionary of theoretical linguistics: English-Arabic, Arabic-English. Beirut: Lebanon Library (1982).
[3]
Imad A Al-Sughaiyer and Ibrahim A Al-Kharashi. 2004. Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society for Information Science and Technology 55, 3 (2004), 189--213.
[4]
Hasan M Alserhan and Aladdin S Ayesh. 2006. An application of neural network for extracting Arabic wordroots. WSEAS Transactions on Computers 5, 11 (2006), 2623--2627.
[5]
Riyad Alshalabi. 2005. Pattern-based stemmer for finding Arabic roots. Information Technology Journal 4, 1 (2005), 38--43.
[6]
Imene Bensalem, Imene Boukhalfa, Paolo Rosso, Lahsen Abouenour, Kareem Darwish, and Salim Chikhi. 2015. Overview of the AraPlagDet PAN@ FIRE2015 Shared Task on Arabic Plagiarism Detection. In FIRE Workshops. 111--122.
[7]
Leonard Bloomfield. 1933. Language history: from Language (1933 ed.). Holt, Rinehart and Winston.
[8]
Aitao Chen and Fredric C Gey. 2002. Building an Arabic Stemmer for Information Retrieval. In TREC, Vol. 2002. 631--639.
[9]
Ahmed Magdy Ezzeldin and Mohamed Shaheen. 2012. A survey of Arabic question answering: Challenges, tasks, approaches, tools, and future trends. In Proceedings of The 13th International Arab Conference on Information Technology (ACIT 2012). 1--8.
[10]
Meryeme Hadni, Abdelmonaime Lachkar, and S Alaoui Ouatik. 2012. A new and efficient stemming technique for Arabic Text Categorization. In Multimedia Computing and Systems (ICMCS), 2012 International Conference on. IEEE, 791--796.
[11]
Leah S Larkey, Lisa Ballesteros, and Margaret E Connell. 2002. Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 275--282.
[12]
Chris D Paice. 1994. An evaluation method for stemming algorithms. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York, Inc., 42--50.
[13]
Bruno R Preiss. 2005. Data Structures and Algorithms with Object-Oriented Design Patterns in Python. (2005).
[14]
Fred S Roberts. 1978. Graph theory and its applications to problems of society. SIAM.
[15]
Monica Rogati, Scott McCarley, and Yiming Yang. 2003. Unsupervised learning of arabic stemming using a parallel corpus. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 391--398.
[16]
B Saliba and A Al-Dannan. 1989. Automatic morphological analysis of Arabic: a study of content word analysis. In Proceedings of the Kuwait Computer Conference. 3--5.
[17]
Robert Sedgewick and Kevin Wayne. 2011. Algorithms 4th edition. AddisonWesley, Boston.
[18]
Tengku Mohd T Sembok and Belal Abu Ata. 2013. Arabic word stemming algorithms and retrieval effectiveness. In Proceedings of the World Congress on Engineering, Vol. 3. 3--5.
[19]
Steven S Skiena. 1998. The algorithm design manual: Text. Vol. 1. Springer Science & Business Media.
[20]
Botrous Thalouth and Abdullah Al-Dannan. 1990. A comprehensive Arabic morphological analyser generator. In Computers and the Arabic language. Taylor & Francis/Hemisphere, 208--217.
[21]
Mihalis Yannakakis. 1981. Computing the minimum fill-in is NP-complete. SIAM Journal on Algebraic Discrete Methods 2, 1 (1981), 77--79.

Cited By

View all
  • (2024)A Systematic Review of Stemmers of Indian and Non-Indian Vernacular LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360461223:1(1-51)Online publication date: 15-Jan-2024
  • (2023)An Analytical Analysis of Text Stemming Methodologies in Information Retrieval and Natural Language Processing SystemsIEEE Access10.1109/ACCESS.2023.333271011(133681-133702)Online publication date: 2023
  • (2020)Empirical evaluation and study of text stemming algorithmsArtificial Intelligence Review10.1007/s10462-020-09828-3Online publication date: 15-Apr-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MedPRAI '18: Proceedings of the 2nd Mediterranean Conference on Pattern Recognition and Artificial Intelligence
March 2018
135 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IAPR: International Association for Pattern Recognition

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Arabic
  2. Extrinsic plagiarism detection
  3. Graph
  4. Natural language processing
  5. Stemming

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MedPRAI '18

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Systematic Review of Stemmers of Indian and Non-Indian Vernacular LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360461223:1(1-51)Online publication date: 15-Jan-2024
  • (2023)An Analytical Analysis of Text Stemming Methodologies in Information Retrieval and Natural Language Processing SystemsIEEE Access10.1109/ACCESS.2023.333271011(133681-133702)Online publication date: 2023
  • (2020)Empirical evaluation and study of text stemming algorithmsArtificial Intelligence Review10.1007/s10462-020-09828-3Online publication date: 15-Apr-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media