Skip to main content

Arabic Light Stemmer Based on ISRI Stemmer

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12838))

Included in the following conference series:

Abstract

The process of stemming is considered as one of the most essential steps in natural language processing and retrieving information. Nevertheless, in Arabic language, the task of stemming remains a major challenge due to the fact that Arabic language has a particular morphology, thereby making it different from other languages. Majority of existing algorithms are limited to a given number of words, create ambiguity between original letters and affixes, and often make use of dictionary patterns or words. We therefore, for the first time, present a design and implementation of Arabic light stemmer based on Information Science Research Institute algorithm. The algorithm is evaluated empirically using a newly created Arabic dataset which was created using data from different Arabic websites with contents that have been written in modern Arabic language. The experimental results indicated that the proposed method outperforms when benchmarked with existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://data.mendeley.com/datasets/spvbf5bgjs.

References

  1. Singh, J., Gupta, V.: Text stemming: approaches, applications, and challenges. ACM Comput. Surv. 49(03), 1–46 (2016)

    Article  Google Scholar 

  2. Harrag, F., El-Qawasmah, E., Al-Salman, A.M.S.: Stemming as a feature reduction technique for Arabic text categorization. İn: 2011 10th International Symposium on Programming and Systems, pp. 128–133 (2011)

    Google Scholar 

  3. Al-Anzi, F.S., AbuZeina, D.: Stemming impact on Arabic text categorization performance: a survey. İn: 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA), pp. 1–7. IEEE (2015)

    Google Scholar 

  4. Al-Abdallah, R.Z., Al-Taani, A.T.: Arabic text summarization using firefly algorithm. İn: 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 61–65. IEEE (2019)

    Google Scholar 

  5. Mansour, N., Haraty, R.A., Daher, W., Houri, M.: An auto-indexing method for Arabic text. Inform. Process. Manage. 44(4), 1538–1545 (2008)

    Article  Google Scholar 

  6. Utomo, M.R.A., Sibaroni, Y.: Text classification of British English and American English using support vector machine. In: 7th International Conference on Information and Communication Technology (ICoICT), pp. 1–6. IEEE (2019)

    Google Scholar 

  7. Savoy, J.: A stemming procedure and stopword list for general French corpora. J. Am. Soc. Inform. Sci. 50(10), 944–952 (1999)

    Article  Google Scholar 

  8. Gupta, V., Joshi, N., Mathur, I.: Advanced Machine Learning Techniques in Natural Language Processing for Indian Languages. In: Mishra, M.K., Mishra, B.S.P., Patel, Y.S., Misra, R. (eds.) Smart Techniques for a Smarter Planet. SFSC, vol. 374, pp. 117–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03131-2_7

    Chapter  Google Scholar 

  9. Kassim, M., Jali, S., Maarof, M., Zainal, A.: Towards stemming error reduction for Malay texts. Presented at the (2019). https://doi.org/10.1007/978-981-13-2622-6_2

    Chapter  Google Scholar 

  10. Otair, M.A.: Comparative analysis of Arabic stemming algorithms. Int. J. Manag. Inform. Technol. 5(2), 1–13 (2013)

    Google Scholar 

  11. Abooraig, R., Al-, S., Kanan, T., Hawashin, B., Al , M., Hmeidi, I.: Automatic categorization of Arabic articles based on their political orientation. Digital Invest. 25, 24–41 (2018)

    Article  Google Scholar 

  12. Khoja, S., Garside, R.: Stemming Arabic Text. Computing Department, Lancaster University, Lancaster, UK (1999)

    Google Scholar 

  13. Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. İn: Proceedings of the 25th Annual İnternational ACM SIGIR Conference on Research and Development in İnformation Retrieval, pp. 275–282 (2002)

    Google Scholar 

  14. Taghva, K., Elkhoury, R., Coombs, J.: Arabic stemming without a root dictionary. İn: International Conference on Information Technology: Coding and Computing (ITCC2005)-Volume II, vol.1, pp. 152–157. IEEE (2005)

    Google Scholar 

  15. Khan, W., Kuru, K.: An intelligent system for spoken term detection that uses belief combination. IEEE Intell. Syst. 32, 70–79 (2017)

    Article  Google Scholar 

  16. Ezzeldin, A.M., Shaheen, M.: A survey of Arabic question answering: challenges, tasks, approaches, tools, and future trends. İn: Proceedings of The 13th International Arab Conference on Information Technology (ACIT 2012), pp. 1–8 (2012)

    Google Scholar 

  17. Oraby, S., El-Sonbaty, Y., El-Nasr, M.A.: Exploring the effects of word roots for arabic sentiment analysis. İn: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 471–479 (2013)

    Google Scholar 

  18. Ezzeldin, A., El-, A., Kholief, M.: Exploring the Effects of Root Expansion. College of Computing and Information Technology, AASTMT, Alexandria, Egypt (2013)

    Google Scholar 

  19. Kreaa, A.H., Ahmad, A.S., Kabalan, K.: Arabic words stemming approach using Arabic WordNet. Int. J. Data Mining Knowl. Manage. Process. 4(6), 1–14 (2014)

    Article  Google Scholar 

  20. Zeroual, I., Boudchiche, M., Mazroui, A., Lakhouaja, A.: Improving Arabic light stemming algorithm using linguistic resources. In: The Second National Doctoral Symposium on Arabic Language Engineering (JDILA2015). Fez, Morocco (2015)

    Google Scholar 

  21. Boudchiche, M., Mazroui A., Bebah, A., Lakhouaja, A., Boudlal, A.: L’Analyseur Morphosyntaxique AlKhalil Morpho Sys 2. İn: 1ère Journée Doctorale Nationale sur l’Ingénierie de la Langue Arabe (JDILA2014), pp. 1–5 (2014)

    Google Scholar 

  22. Aldabbas, O., Al-Shalabi, R., Kanan, G., Shehabd, M.A.: Arabic light stemmer based on regular expression. İn: Proceedings of the International Computer Sciences and Informatics Conference (ICSIC 2016), pp. 1–9 (2016)

    Google Scholar 

  23. Khedr, S., Sayed, D., Hanafy, A.: Arabic light stemmer for better search accuracy. Int. J. Cognit. Lang. Sci. 10(11), 3587–3595 (2016)

    Google Scholar 

  24. El-, S.R., Rafea, R.: An accuracy-enhanced light stemmer for Arabic text. ACM Trans. Speech Lang. Process. 7(02), 1–22 (2010)

    Article  Google Scholar 

  25. Abainia, K., Ouamour, S., Sayoud, H.: A novel robust Arabic light stemmer. J. Exp. Theor. Artif. Intell. 29(03), 557–573 (2017)

    Article  Google Scholar 

  26. Al-, Y.A., Matarneh, K., Hasan, M.: Conditional Arabic light stemmer: condlight. Int. Arab J. Inform. Technol. 15(03), 559–564 (2018)

    Google Scholar 

  27. Al-, W.W., Zaid, N.A.: Arabic stemmer system based on rules of roots. Int. J. Inform. Technol. Lang. Stud. 2(1), 19–26 (2018)

    Google Scholar 

  28. Mustafa, M., Aldeen, A., Zidan, M., Ahmed, R., Eltigani, Y.: Developing two different novel techniques for Arabic text stemming. Intell. Inform. Manage. 11(01), 1–23 (2019). https://doi.org/10.4236/iim.2019.111001

    Article  Google Scholar 

  29. Abd, D.H., Abbas, A.R., Sadiq, A.T.: Analyzing sentiment system to specify polarity by lexicon-based. Bull. Electr. Eng. Inform. 10(1), 283–289 (2020)

    Article  Google Scholar 

  30. Abd, D., Sadiq, A., Abbas, A.: Political Articles Categorization Based on Different Naïve Bayes Models. In: Khalaf, M.I., Al-Jumeily, D., Lisitsa, A. (eds.) Applied Computing to Support Industry: Innovation and Technology: First International Conference, ACRIT 2019, Ramadi, Iraq, September 15–16, 2019, Revised Selected Papers, pp. 286–301. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-38752-5_23

    Chapter  Google Scholar 

  31. Abd, D., Sadiq, A., Abbas, A.: Classifying Political Arabic Articles Using Support Vector Machine with Different Feature Extraction. In: Khalaf, M.I., Al-Jumeily, D., Lisitsa, A. (eds.) ACRIT 2019. CCIS, vol. 1174, pp. 79–94. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38752-5_7

    Chapter  Google Scholar 

  32. Alwan, J.K., Hussain, J., Abd, D.H., Sadiq, A.T., Khalaf, M., Liatsis, P.: Political Arabic articles orientation using rough set theory with sentiment lexicon. IEEE Access 09, 24475–24484 (2021)

    Article  Google Scholar 

  33. Hardeniya, N., Perkins, J., Chopra, D., Joshi, N., Mathur, I.: Natural Language Processing: Python and NLTK. Packt Publishing Ltd (2016)

    Google Scholar 

  34. Abd, D.H., Sadiq, A.T., Abbas, A.R.: Political Arabic articles classification based on machine learning and hybrid vector. İn: 2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA), pp. 1–7. IEEE (2020)

    Google Scholar 

  35. Abbas, A.R., Sadiq, A.T., Abd, D.H.: PAAD: political Arabic articles dataset for automatic text categorization. Iraq. J. Comput. Inform. 46(01), 1–11 (2020)

    Article  Google Scholar 

  36. Jaafar, Y., Bouzoubaa, K.: A survey and comparative study of Arabic NLP architectures. In: Shaalan, K., Hassanien, A.E., Tolba, F. (eds.) Intelligent Natural Language Processing: Trends and Applications. SCI, vol. 740, pp. 585–610. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67056-0_28

    Chapter  Google Scholar 

  37. Porter, M.F.: Snowball: A Language for Stemming Algorithms, https://snowballstem.org/credits.html (2001)

  38. Chelli, A.: Assem’s Arabic Stemmer. https://arabicstemmer.com/ (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wasiq Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abd, D.H., Khan, W., Thamer, K.A., Hussain, A.J. (2021). Arabic Light Stemmer Based on ISRI Stemmer. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-84532-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-84531-5

  • Online ISBN: 978-3-030-84532-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics