Abstract
Stemming is a process of reducing inflected words to their stem, base or root from a generally written word form. For languages that is high inflected like Arabic. Stemming improve the retrieval performance by reducing words variants. The effectiveness of stop words lists with light stemming for Arabic information retrieval (General stopwords list, Khoja stopwords list, Combined stopwords list), were investigated in this paper. Using vector space model as the popular weighting scheme was examined. The idea is to combine (General and Khoja) stopwords lists with light stemming to enhance the performance, and compare their effects on retrieval. The Linguistic Data Consortium (LDC) Arabic Newswire data set was used. The best performance was achieved with the Combined stopwords list, with light stemming.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Al-Maimani, M.R., Naamany, A.A., Bakar, A.Z.A.: Arabic Information Retrieval: Techniques, tools and challenges. In: 2011 IEEE GCC Conference and Exhibition (GCC), Dubai, pp. 541–544 (2011)
Larkey, L., Ballesteros, L., Connell, M.: Light stemming for Arabic information retrieval. Arabic Computational Morphology, 221–243 (2007)
El-Khair, I.A.: Effects of stop words elimination for Arabic information retrieval: a comparative study. International Journal of Computing & Information Sciences 4, 119–133 (2006)
Khoja, S.: APT: Arabic part-of-speech tagger, pp. 20–25 (2001)
Al-Shammari, E., Lin, J.: A novel Arabic lemmatization algorithm, pp. 113–118 (2008)
Croft, W.B., Metzler, D., Strohman, T.: Search engines: Information retrieval in practice. Addison-Wesley (2010)
Fox, C.: A stop list for general text. ACM SIGIR Forum, 19–21 (1989)
Larkey, L.S., Connell, M.E.: Arabic information retrieval at UMass in TREC-10. NIST Special Publication SP, pp. 562–570 (2002)
Chen, A., Gey, F.: Building an Arabic stemmer for information retrieval. In: Proceedings of TREC (2002)
Savoy, J., Rasolofo, Y.: Report on the TREC-11 experiment: Arabic, named page and topic distillation searches. In: TREC-11, pp. 765–774 (2003)
Darwish, K., Oard, D.W.: CLIR Experiments at Maryland for TREC-2002: Evidence combination for Arabic-English retrieval. DTIC Document (2003)
Aljlayl, M., Frieder, O.: On Arabic search: improving the retrieval effectiveness via a light stemming approach, pp. 340–347 (2002)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
National Institute of Standards and Technology, TREC 2002 cross language topics in Arabic (2002), http://trec.nist.gov/data/topics_noneng/
National Institute of Standards and Technology, Data - Non-English Relevance Judgements File List (2001), http://trec.nist.gov/data/qrels_noneng/
Gey, F.C., Oard, D.W.: The TREC-2001 cross-language information retrieval track: Searching Arabic using English, French or Arabic queries. In: TREC, pp. 16–26 (2001)
Taghva, K., Elkhoury, R., Coombs, J.: Arabic stemming without a root dictionary. In: International Conference on Information Technology: Coding and Computing, ITCC 2005, pp. 152–157 (2005)
Alshehri, A.M.: Optimization and effectiveness of n-grams approach for indexing and retrieval in Arabic information retrieval systems (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Atwan, J., Mohd, M., Kanaan, G. (2013). Enhanced Arabic Information Retrieval: Light Stemming and Stop Words. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-40567-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40566-2
Online ISBN: 978-3-642-40567-9
eBook Packages: Computer ScienceComputer Science (R0)