Skip to main content

Enhanced Arabic Information Retrieval: Light Stemming and Stop Words

  • Conference paper
Soft Computing Applications and Intelligent Systems (M-CAIT 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 378))

Included in the following conference series:

Abstract

Stemming is a process of reducing inflected words to their stem, base or root from a generally written word form. For languages that is high inflected like Arabic. Stemming improve the retrieval performance by reducing words variants. The effectiveness of stop words lists with light stemming for Arabic information retrieval (General stopwords list, Khoja stopwords list, Combined stopwords list), were investigated in this paper. Using vector space model as the popular weighting scheme was examined. The idea is to combine (General and Khoja) stopwords lists with light stemming to enhance the performance, and compare their effects on retrieval. The Linguistic Data Consortium (LDC) Arabic Newswire data set was used. The best performance was achieved with the Combined stopwords list, with light stemming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Al-Maimani, M.R., Naamany, A.A., Bakar, A.Z.A.: Arabic Information Retrieval: Techniques, tools and challenges. In: 2011 IEEE GCC Conference and Exhibition (GCC), Dubai, pp. 541–544 (2011)

    Google Scholar 

  2. Larkey, L., Ballesteros, L., Connell, M.: Light stemming for Arabic information retrieval. Arabic Computational Morphology, 221–243 (2007)

    Google Scholar 

  3. El-Khair, I.A.: Effects of stop words elimination for Arabic information retrieval: a comparative study. International Journal of Computing & Information Sciences 4, 119–133 (2006)

    Google Scholar 

  4. Khoja, S.: APT: Arabic part-of-speech tagger, pp. 20–25 (2001)

    Google Scholar 

  5. Al-Shammari, E., Lin, J.: A novel Arabic lemmatization algorithm, pp. 113–118 (2008)

    Google Scholar 

  6. Croft, W.B., Metzler, D., Strohman, T.: Search engines: Information retrieval in practice. Addison-Wesley (2010)

    Google Scholar 

  7. Fox, C.: A stop list for general text. ACM SIGIR Forum, 19–21 (1989)

    Google Scholar 

  8. Larkey, L.S., Connell, M.E.: Arabic information retrieval at UMass in TREC-10. NIST Special Publication SP, pp. 562–570 (2002)

    Google Scholar 

  9. Chen, A., Gey, F.: Building an Arabic stemmer for information retrieval. In: Proceedings of TREC (2002)

    Google Scholar 

  10. Savoy, J., Rasolofo, Y.: Report on the TREC-11 experiment: Arabic, named page and topic distillation searches. In: TREC-11, pp. 765–774 (2003)

    Google Scholar 

  11. Darwish, K., Oard, D.W.: CLIR Experiments at Maryland for TREC-2002: Evidence combination for Arabic-English retrieval. DTIC Document (2003)

    Google Scholar 

  12. Aljlayl, M., Frieder, O.: On Arabic search: improving the retrieval effectiveness via a light stemming approach, pp. 340–347 (2002)

    Google Scholar 

  13. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  14. National Institute of Standards and Technology, TREC 2002 cross language topics in Arabic (2002), http://trec.nist.gov/data/topics_noneng/

  15. National Institute of Standards and Technology, Data - Non-English Relevance Judgements File List (2001), http://trec.nist.gov/data/qrels_noneng/

  16. Gey, F.C., Oard, D.W.: The TREC-2001 cross-language information retrieval track: Searching Arabic using English, French or Arabic queries. In: TREC, pp. 16–26 (2001)

    Google Scholar 

  17. Taghva, K., Elkhoury, R., Coombs, J.: Arabic stemming without a root dictionary. In: International Conference on Information Technology: Coding and Computing, ITCC 2005, pp. 152–157 (2005)

    Google Scholar 

  18. Alshehri, A.M.: Optimization and effectiveness of n-grams approach for indexing and retrieval in Arabic information retrieval systems (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Atwan, J., Mohd, M., Kanaan, G. (2013). Enhanced Arabic Information Retrieval: Light Stemming and Stop Words. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40567-9_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40566-2

  • Online ISBN: 978-3-642-40567-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics