Skip to main content

Stemming Arabic Conjunctions and Prepositions

  • Conference paper
String Processing and Information Retrieval (SPIRE 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3772))

Included in the following conference series:

Abstract

Arabic is the fourth most widely spoken language in the world, and is characterised by a high rate of inflection. To cater for this, most Arabic information retrieval systems incorporate a stemming stage. Most existing Arabic stemmers are derived from English equivalents; however, unlike English, most affixes in Arabic are difficult to discriminate from the core word. Removing incorrectly identified affixes sometimes results in a valid but incorrect stem, and in most cases reduces retrieval precision. Conjunctions and prepositions form an interesting class of these affixes. In this work, we present novel approaches for dealing with these affixes. Unlike previous approaches, our approaches focus on retaining valid Arabic core words, while maintaining high retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Sughaiyer, I.A., Al-Kharashi, I.A.: Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society for Information Science and Technology 55(3), 189–213 (2004)

    Article  Google Scholar 

  2. Aljlayl, M., Frieder, O.: On Arabic search: improving the retrieval effectiveness via a light stemming approach. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 340–347. ACM Press, New York (2002)

    Google Scholar 

  3. Chen, A., Gey, F.: Building an Arabic stemmer for information retrieval. In: Proceedings of the Eleventh Text REtrieval Conference (TREC 2002), November 2002, National Institute of Standards and Technology (2002)

    Google Scholar 

  4. Darwish, K., Oard, D.W.: Term selection for searching printed Arabic. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, pp. 261–268. ACM Press, New York (2002)

    Google Scholar 

  5. Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. SIGIR Forum 37(1), 26–30 (2003)

    Article  Google Scholar 

  6. Gey, F.C., Oard, D.W.: The TREC-2001 cross-language information retrieval track: Searching Arabic using English, French or Arabic queries. In: Proceedings of TREC10, NIST, Gaithersburg (2001)

    Google Scholar 

  7. Khoja, S., Garside, R.: Stemming Arabic text. Technical report, Computing Department, Lancaster University, Lancaster (September 1999)

    Google Scholar 

  8. Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, pp. 275–282. ACM Press, New York (2002)

    Google Scholar 

  9. Microsoft Corporation. Arabic proofing tools in Office (2003), 2002, http://www.microsoft.com/middleeast/arabicdev/office/office2003/Proofing.asp

  10. Moukdad, H.: Lost in cyberspace: How do search engine handle Arabic queries. In: Proceedings of CAIS/ACSI 2004 Access to information: Skills, and Socio-political Context (June 2004)

    Google Scholar 

  11. Oard, D.W., Gey, F.C.: The TREC-2002 Arabic/English CLIR track. In: TREC 2002 (2002)

    Google Scholar 

  12. Sanderson, M.A., Zobel, J.: Information retrieval system evaluation: Effort, Sensitivity, and Reliability. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval. ACM Press, New York (2005) (to appear)

    Google Scholar 

  13. Wright, W.: A Grammar of the Arabic language, 3rd edn., vol. 1. Librairie du Liban, Lebanon (1874)

    Google Scholar 

  14. Yagoub, A.B.: Mausooat Annaho wa Assarf. Dar Alilm Lilmalayn, third reprint (1988)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nwesri, A.F.A., Tahaghoghi, S.M.M., Scholer, F. (2005). Stemming Arabic Conjunctions and Prepositions. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_23

Download citation

  • DOI: https://doi.org/10.1007/11575832_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29740-6

  • Online ISBN: 978-3-540-32241-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics