Abstract
The process of stemming is considered as one of the most essential steps in natural language processing and retrieving information. Nevertheless, in Arabic language, the task of stemming remains a major challenge due to the fact that Arabic language has a particular morphology, thereby making it different from other languages. Majority of existing algorithms are limited to a given number of words, create ambiguity between original letters and affixes, and often make use of dictionary patterns or words. We therefore, for the first time, present a design and implementation of Arabic light stemmer based on Information Science Research Institute algorithm. The algorithm is evaluated empirically using a newly created Arabic dataset which was created using data from different Arabic websites with contents that have been written in modern Arabic language. The experimental results indicated that the proposed method outperforms when benchmarked with existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Singh, J., Gupta, V.: Text stemming: approaches, applications, and challenges. ACM Comput. Surv. 49(03), 1–46 (2016)
Harrag, F., El-Qawasmah, E., Al-Salman, A.M.S.: Stemming as a feature reduction technique for Arabic text categorization. İn: 2011 10th International Symposium on Programming and Systems, pp. 128–133 (2011)
Al-Anzi, F.S., AbuZeina, D.: Stemming impact on Arabic text categorization performance: a survey. İn: 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA), pp. 1–7. IEEE (2015)
Al-Abdallah, R.Z., Al-Taani, A.T.: Arabic text summarization using firefly algorithm. İn: 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 61–65. IEEE (2019)
Mansour, N., Haraty, R.A., Daher, W., Houri, M.: An auto-indexing method for Arabic text. Inform. Process. Manage. 44(4), 1538–1545 (2008)
Utomo, M.R.A., Sibaroni, Y.: Text classification of British English and American English using support vector machine. In: 7th International Conference on Information and Communication Technology (ICoICT), pp. 1–6. IEEE (2019)
Savoy, J.: A stemming procedure and stopword list for general French corpora. J. Am. Soc. Inform. Sci. 50(10), 944–952 (1999)
Gupta, V., Joshi, N., Mathur, I.: Advanced Machine Learning Techniques in Natural Language Processing for Indian Languages. In: Mishra, M.K., Mishra, B.S.P., Patel, Y.S., Misra, R. (eds.) Smart Techniques for a Smarter Planet. SFSC, vol. 374, pp. 117–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03131-2_7
Kassim, M., Jali, S., Maarof, M., Zainal, A.: Towards stemming error reduction for Malay texts. Presented at the (2019). https://doi.org/10.1007/978-981-13-2622-6_2
Otair, M.A.: Comparative analysis of Arabic stemming algorithms. Int. J. Manag. Inform. Technol. 5(2), 1–13 (2013)
Abooraig, R., Al-, S., Kanan, T., Hawashin, B., Al , M., Hmeidi, I.: Automatic categorization of Arabic articles based on their political orientation. Digital Invest. 25, 24–41 (2018)
Khoja, S., Garside, R.: Stemming Arabic Text. Computing Department, Lancaster University, Lancaster, UK (1999)
Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. İn: Proceedings of the 25th Annual İnternational ACM SIGIR Conference on Research and Development in İnformation Retrieval, pp. 275–282 (2002)
Taghva, K., Elkhoury, R., Coombs, J.: Arabic stemming without a root dictionary. İn: International Conference on Information Technology: Coding and Computing (ITCC2005)-Volume II, vol.1, pp. 152–157. IEEE (2005)
Khan, W., Kuru, K.: An intelligent system for spoken term detection that uses belief combination. IEEE Intell. Syst. 32, 70–79 (2017)
Ezzeldin, A.M., Shaheen, M.: A survey of Arabic question answering: challenges, tasks, approaches, tools, and future trends. İn: Proceedings of The 13th International Arab Conference on Information Technology (ACIT 2012), pp. 1–8 (2012)
Oraby, S., El-Sonbaty, Y., El-Nasr, M.A.: Exploring the effects of word roots for arabic sentiment analysis. İn: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 471–479 (2013)
Ezzeldin, A., El-, A., Kholief, M.: Exploring the Effects of Root Expansion. College of Computing and Information Technology, AASTMT, Alexandria, Egypt (2013)
Kreaa, A.H., Ahmad, A.S., Kabalan, K.: Arabic words stemming approach using Arabic WordNet. Int. J. Data Mining Knowl. Manage. Process. 4(6), 1–14 (2014)
Zeroual, I., Boudchiche, M., Mazroui, A., Lakhouaja, A.: Improving Arabic light stemming algorithm using linguistic resources. In: The Second National Doctoral Symposium on Arabic Language Engineering (JDILA2015). Fez, Morocco (2015)
Boudchiche, M., Mazroui A., Bebah, A., Lakhouaja, A., Boudlal, A.: L’Analyseur Morphosyntaxique AlKhalil Morpho Sys 2. İn: 1ère Journée Doctorale Nationale sur l’Ingénierie de la Langue Arabe (JDILA2014), pp. 1–5 (2014)
Aldabbas, O., Al-Shalabi, R., Kanan, G., Shehabd, M.A.: Arabic light stemmer based on regular expression. İn: Proceedings of the International Computer Sciences and Informatics Conference (ICSIC 2016), pp. 1–9 (2016)
Khedr, S., Sayed, D., Hanafy, A.: Arabic light stemmer for better search accuracy. Int. J. Cognit. Lang. Sci. 10(11), 3587–3595 (2016)
El-, S.R., Rafea, R.: An accuracy-enhanced light stemmer for Arabic text. ACM Trans. Speech Lang. Process. 7(02), 1–22 (2010)
Abainia, K., Ouamour, S., Sayoud, H.: A novel robust Arabic light stemmer. J. Exp. Theor. Artif. Intell. 29(03), 557–573 (2017)
Al-, Y.A., Matarneh, K., Hasan, M.: Conditional Arabic light stemmer: condlight. Int. Arab J. Inform. Technol. 15(03), 559–564 (2018)
Al-, W.W., Zaid, N.A.: Arabic stemmer system based on rules of roots. Int. J. Inform. Technol. Lang. Stud. 2(1), 19–26 (2018)
Mustafa, M., Aldeen, A., Zidan, M., Ahmed, R., Eltigani, Y.: Developing two different novel techniques for Arabic text stemming. Intell. Inform. Manage. 11(01), 1–23 (2019). https://doi.org/10.4236/iim.2019.111001
Abd, D.H., Abbas, A.R., Sadiq, A.T.: Analyzing sentiment system to specify polarity by lexicon-based. Bull. Electr. Eng. Inform. 10(1), 283–289 (2020)
Abd, D., Sadiq, A., Abbas, A.: Political Articles Categorization Based on Different Naïve Bayes Models. In: Khalaf, M.I., Al-Jumeily, D., Lisitsa, A. (eds.) Applied Computing to Support Industry: Innovation and Technology: First International Conference, ACRIT 2019, Ramadi, Iraq, September 15–16, 2019, Revised Selected Papers, pp. 286–301. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-38752-5_23
Abd, D., Sadiq, A., Abbas, A.: Classifying Political Arabic Articles Using Support Vector Machine with Different Feature Extraction. In: Khalaf, M.I., Al-Jumeily, D., Lisitsa, A. (eds.) ACRIT 2019. CCIS, vol. 1174, pp. 79–94. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38752-5_7
Alwan, J.K., Hussain, J., Abd, D.H., Sadiq, A.T., Khalaf, M., Liatsis, P.: Political Arabic articles orientation using rough set theory with sentiment lexicon. IEEE Access 09, 24475–24484 (2021)
Hardeniya, N., Perkins, J., Chopra, D., Joshi, N., Mathur, I.: Natural Language Processing: Python and NLTK. Packt Publishing Ltd (2016)
Abd, D.H., Sadiq, A.T., Abbas, A.R.: Political Arabic articles classification based on machine learning and hybrid vector. İn: 2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA), pp. 1–7. IEEE (2020)
Abbas, A.R., Sadiq, A.T., Abd, D.H.: PAAD: political Arabic articles dataset for automatic text categorization. Iraq. J. Comput. Inform. 46(01), 1–11 (2020)
Jaafar, Y., Bouzoubaa, K.: A survey and comparative study of Arabic NLP architectures. In: Shaalan, K., Hassanien, A.E., Tolba, F. (eds.) Intelligent Natural Language Processing: Trends and Applications. SCI, vol. 740, pp. 585–610. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67056-0_28
Porter, M.F.: Snowball: A Language for Stemming Algorithms, https://snowballstem.org/credits.html (2001)
Chelli, A.: Assem’s Arabic Stemmer. https://arabicstemmer.com/ (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Abd, D.H., Khan, W., Thamer, K.A., Hussain, A.J. (2021). Arabic Light Stemmer Based on ISRI Stemmer. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-84532-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84531-5
Online ISBN: 978-3-030-84532-2
eBook Packages: Computer ScienceComputer Science (R0)