Abstract
Persian is a challenging language in the field of NLP. Right-to-left orthography, complex morphology, complicated grammatical rules, and different forms of letters make it an interesting language for NLP research. In this paper we measure the effectiveness of a simple and efficient stemming algorithm, Perstem, on Persian information retrieval. Our experiments on the Hamshahri corpus at CLEF2009 show that the Perstem algorithm greatly improved both precision (+91%) and recall (+43%).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agirre, E., Di Nunzio, G.M., Ferro, N., Mandl, T., Peters, C.: CLEF 2008 Ad hoc track overview. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 15–37. Springer, Heidelberg (2009)
AleAhmad, A., Kamalloo, E., Zareh, A., Rahgozar, M., Oroumchian, F.: Cross Language Experiments at Persian@CLEF 2008. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 105–112. Springer, Heidelberg (2009)
AleAhmad, A., Amiri, H., Darrudi, E., Rahgozar, M., Oroumchian, F.: Hamshahri: A standard Persian text collection. Knowledge-Based Systems 22(5), 382–387 (2009)
Dehdari, J., Lonsdale, D.: A link grammar parser for Persian. Aspects of Iranian Linguistics, vol. 1. Cambridge Scholars Press (2008)
Dolamic, L., Savoy, J.: Persian Language, Is Stemming Efficient? In: 20th International Workshop on Database and Expert Systems Application, Linz, Austria, pp. 388–392 (2009)
Ferro, N., Peters, C.: CLEF 2009 Ad Hoc Track Overview: TEL & Persian Tasks. In: Workshop on Cross-Language Information Retrieval and Evaluation, Corfu, Greece (2009)
Karimpour, R., Ghorbani, A., Pishdad, A., Mohtarami, M., AleAhmad, A.: Using Part of Speech tagging in Persian Information Retrieval. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, Springer, Heidelberg (2009)
Metzler, D., Croft, W.B.: Combining the Language Model and Inference Network Approaches to Retrieval. Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval 40(5), 735–750 (2004)
Mokhtaripour, A., Jahanpour, S.: Introduction to a new Farsi stemmer. In: 15th ACM International Conference on Information and Knowledge Management. ACM, USA (2006)
Shahbazi, H., Mokhtaripour, A., Dalvi, M., Tork Ladani, B.: A New Approach for Scoring Relevant Documents by Applying a Farsi Stemming Method in Persian Web Search Engines. In: 13th International CSI Computer Conference, Kish Island, Iran, pp. 745–748 (2008)
Sharifloo, A., Shamsfard, M.: A Bottom Up approach to Persian Stemming. In: Third International Joint Conference on Natural Language Processing. ACL, India (2008)
Taghva, K., Beckley, R., Sadeh, M.: A Stemming Algorithm for the Farsi Language. In: International Conference on Information Technology: Coding and Computing. IEEE Computer Society, USA (2005)
Tashakori, M., Meybodi, M., Oroumchian, F.: Bon: First Persian Stemmer. In: First Eurasia Conference on Advances in Information and Communication Technology, Tehran, Iran (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jadidinejad, A.H., Mahmoudi, F., Dehdari, J. (2010). Evaluation of Perstem: A Simple and Efficient Stemming Algorithm for Persian. In: Peters, C., et al. Multilingual Information Access Evaluation I. Text Retrieval Experiments. CLEF 2009. Lecture Notes in Computer Science, vol 6241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15754-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-15754-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15753-0
Online ISBN: 978-3-642-15754-7
eBook Packages: Computer ScienceComputer Science (R0)