Abstract
In this paper, we introduce a new lexicon free, probabilistic stemmer to be used in a developing Turkish Information Retrieval system. It has a linear computational complexity and its test success ratio is 95.8%. The main contribution of this paper is to give a thorough description of a probabilistic perspective for stemming which can also be generalized to apply to other agglutinative languages like Finnish, Hungarian, Estonian and Czech.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jurafsky, D., Martin, J.M.: Speech and Language Processing. Prentice-Hall, New Jersey (2000)
Hankamer, J.: Turkish generative morphology and morphological parsing. In: Second International Conference on Turkish Linguistics, Istanbul, Turkey (1984)
Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. In: Publications of the Department of General Linguistics, vol. 11, University of Helsinki, Helsinki (1983)
Oflazer, K.: Two Level Description of Turkish Morphology. In: Proceedings of EACL 1998, Utrecht, The Netherlands (1993)
Ekmekçioglu, F., Çuna, L., Michael, F., Willett, P.: Stemming and N-gram matching for term conflation in Turkish texts. Information Research 1(1) (1996), Available at http://informationr.net/ir/2-2/paper13.html
Solak, A.: Can. F.: Effects of Stemming on Turkish Text Retrieval. Technical Report BU-CEIS-94-20. Department of Computer Engineering and Information Science, Bilkent University, Ankara (1994)
Barton, G.E.: Computational Complexity in Two-Level morphology. In: ACL Proceedings, 24th Annual Meeting (1986)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 1st edn. Addison-Wesley, England (1999)
Lovins, J.B.: Developing of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Öztaner, S.M.: A Word Grammar of Turkish with Morphophonemic Rules. M. Sc. Thesis. Department of Computer Engineering, METU, Ankara, Turkey (1996)
Crystal, D.: The Cambridge Encyclopedia of Language. Cambridge University Press, Cambridge (1987)
Lewis, G.L.: Turkish Grammar. Oxford University Press, UK (1991)
Duran, G.: Turkish Stemming Algorithm. M. Sc. Thesis. Department of Computer Engineering, Hacettepe University, Ankara (1997)
Alpkoçak, A., Kut, A., Özkarahan, E.: Bilgi Bulma Sistemleri için Otomatik Türkçe Dizinleme Yöntemi. In: Bilişim Bildirileri. Dokuz Eylül Üniversitesi, İzmir, Türkiye, pp. 247–253 (1995)
Köksal, A.: Bilgi Erişim Sorunu ve Bir Belge Dizinleme ve Erişim Dizgesi Tasarim ve Gerçeklestirimi. Docentlik Tezi. In: Fen Bilimleri Enstitüsü, Bilgisayar Bilimleri Mühendisliği Anabilim Dali, Hacettepe Üniversitesi, Ankara (1979)
Hakkani-Tiir, D.Z., Oflazer, K., Tir, G.: Statistical Morphological Disambiguation for Agglutinative Languages. In: COLLING (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dinçer, B.T., Karaoğlan, B. (2003). Stemming in Agglutinative Languages: A Probabilistic Stemmer for Turkish. In: Yazıcı, A., Şener, C. (eds) Computer and Information Sciences - ISCIS 2003. ISCIS 2003. Lecture Notes in Computer Science, vol 2869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39737-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-39737-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20409-1
Online ISBN: 978-3-540-39737-3
eBook Packages: Springer Book Archive