Abstract
Arabic is the fourth most widely spoken language in the world, and is characterised by a high rate of inflection. To cater for this, most Arabic information retrieval systems incorporate a stemming stage. Most existing Arabic stemmers are derived from English equivalents; however, unlike English, most affixes in Arabic are difficult to discriminate from the core word. Removing incorrectly identified affixes sometimes results in a valid but incorrect stem, and in most cases reduces retrieval precision. Conjunctions and prepositions form an interesting class of these affixes. In this work, we present novel approaches for dealing with these affixes. Unlike previous approaches, our approaches focus on retaining valid Arabic core words, while maintaining high retrieval performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Al-Sughaiyer, I.A., Al-Kharashi, I.A.: Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society for Information Science and Technology 55(3), 189–213 (2004)
Aljlayl, M., Frieder, O.: On Arabic search: improving the retrieval effectiveness via a light stemming approach. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 340–347. ACM Press, New York (2002)
Chen, A., Gey, F.: Building an Arabic stemmer for information retrieval. In: Proceedings of the Eleventh Text REtrieval Conference (TREC 2002), November 2002, National Institute of Standards and Technology (2002)
Darwish, K., Oard, D.W.: Term selection for searching printed Arabic. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, pp. 261–268. ACM Press, New York (2002)
Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. SIGIR Forum 37(1), 26–30 (2003)
Gey, F.C., Oard, D.W.: The TREC-2001 cross-language information retrieval track: Searching Arabic using English, French or Arabic queries. In: Proceedings of TREC10, NIST, Gaithersburg (2001)
Khoja, S., Garside, R.: Stemming Arabic text. Technical report, Computing Department, Lancaster University, Lancaster (September 1999)
Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, pp. 275–282. ACM Press, New York (2002)
Microsoft Corporation. Arabic proofing tools in Office (2003), 2002, http://www.microsoft.com/middleeast/arabicdev/office/office2003/Proofing.asp
Moukdad, H.: Lost in cyberspace: How do search engine handle Arabic queries. In: Proceedings of CAIS/ACSI 2004 Access to information: Skills, and Socio-political Context (June 2004)
Oard, D.W., Gey, F.C.: The TREC-2002 Arabic/English CLIR track. In: TREC 2002 (2002)
Sanderson, M.A., Zobel, J.: Information retrieval system evaluation: Effort, Sensitivity, and Reliability. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval. ACM Press, New York (2005) (to appear)
Wright, W.: A Grammar of the Arabic language, 3rd edn., vol. 1. Librairie du Liban, Lebanon (1874)
Yagoub, A.B.: Mausooat Annaho wa Assarf. Dar Alilm Lilmalayn, third reprint (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nwesri, A.F.A., Tahaghoghi, S.M.M., Scholer, F. (2005). Stemming Arabic Conjunctions and Prepositions. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_23
Download citation
DOI: https://doi.org/10.1007/11575832_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29740-6
Online ISBN: 978-3-540-32241-2
eBook Packages: Computer ScienceComputer Science (R0)