ABSTRACT
In this paper, a new stemmer algorithm for the Persian language is implemented. It is based on Kazem Taghva algorithm. The evaluation results of the proposed method on the small Farsi document collections are acceptable. But some problems in the morphological based stemmers in Farsi language are found. So some methods are proposed to solve these problems.
- Porter, M. 2001. A language for stemming algorithms. DOI= https://snowball.tartarus.org/texts/introduction.html.Google Scholar
- Tamah Al-Shammari, E. 2008. Towards an Error free Stemming. In Proceeding of IADIS European Conference Data Mining.Google Scholar
- Riaz, Kashif. 2007. Challenges in Urdu Stemming (A Progress Report). In Proceeding of BCS IRSG Symposium: Future Directions in Information Access (FDIA 2007.) Google ScholarDigital Library
- Taghva, Kazem, Beckley, R. and Sadeh, M. 2005. A Stemming Algorithm for the Farsi Language. In Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01 Pages: 158--162. Google ScholarDigital Library
- Taghva, Kazem, Young, Ron, Coombs, Jeffrey, Beckley, Russell, Sadeh, M., and Pereda, Ray. 2003. Farsi Searching and Display Technologies. In Proceeding of the Symposium. on Document Image Understanding Technology, pages 4146, Greenbelt, MD.Google Scholar
- Hull, David A. 1995. Stemming Algorithms Case Study for Detailed Evaluation. Technical Report. Rank Xerox Research Centre, Meylen, France.Google Scholar
- Porter., M. F. 1980. An Algorithm for SUX Stripping. Programs, 14(3):130137.Google ScholarCross Ref
- Savoy, J. 1993. Stemming of French Words Based on Grammatical Category. Journal of the American Society for Information Science, vol. 44, no. 1, pp. 1--9.Google ScholarCross Ref
- Xu, J. and Croft, B. 1998. Corpus-Based Stemming Using Cooccurrence of Word Variants. ACM-Transactions on Information Systems, vol. 16, no. 1, pp. 61--81. Google ScholarDigital Library
- Tomlinson, S. 2004. Lexical and Algorithmic Stemming Compared for 9 European Languages with Hummingbird SearchServerTM at CLEF 2003. In Comparative Evaluation of Multilingual Information Access Systems, ser. Lecture Notes in Computer Science. vol. 3237. Berlin: Springer-Verlag. pp. 286--300.Google Scholar
- Peters, C., Jijkoun, V., Mandl, T., Muller, H., Oard, D., Peñas, A., and Santos, D. Eds. 2008. Advances in Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science. Berlin: Springer-Verlag. vol. 5152. Google ScholarDigital Library
- Korenius, T., Laurikkala, J., Jarvelin, K., and Juhola, M. 2004. Stemming and Lemmatization in The Clustering of Finish Text Documents. In Proceedings of the ACM-CIKM. Washington DC: The ACM Press. pp. 625--633. Google ScholarDigital Library
- Dolamic, Ljiljana, and Savoy, Jacques 2009. Persian Language, is Stemming Efficient?. In Proceeding of 20th International Workshop on Database and Expert Systems Application (DEXA '09). Pages: 388--392. Google ScholarDigital Library
- Ghasem Sani, GholamReza, and Hesami, Reza 2006. A Stemming Algorithm for Farsi Language. In Proceeding of 11 International CSI Computer Conference (CSICC'2006).Google Scholar
- Azim Sharifloo, Amir, and Shamsfard, Mehrnoush. A Bottom up Approach to Persian Stemming. Shahid Beheshti University, Tehran, Iran.Google Scholar
Index Terms
- Implementation of a new method for stemming in Persian language
Recommendations
Stemming resource-poor Indian languages
Stemming is a basic method for morphological normalization of natural language texts. In this study, we focus on the problem of stemming several resource-poor languages from Eastern India, viz., Assamese, Bengali, Bishnupriya Manipuri and Bodo. While ...
Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages
The number of possible word forms is theoretically infinite in agglutinative languages. This brings up the out-of-vocabulary (OOV) issue for part-of-speech (PoS) tagging in agglutinative languages. Since inflectional morphology does not change the PoS ...
Persian Language, Is Stemming Efficient?
DEXA '09: Proceedings of the 2009 20th International Workshop on Database and Expert Systems ApplicationThe main goal of this paper is to describe and evaluate different indexing and stemming strategies for the Farsi (Persian) language. For this Indo-European language we have suggested a stopword list and a light stemmer. We have compared this stemmer to ...
Comments