Skip to main content

Stemming in Agglutinative Languages: A Probabilistic Stemmer for Turkish

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2869))

Abstract

In this paper, we introduce a new lexicon free, probabilistic stemmer to be used in a developing Turkish Information Retrieval system. It has a linear computational complexity and its test success ratio is 95.8%. The main contribution of this paper is to give a thorough description of a probabilistic perspective for stemming which can also be generalized to apply to other agglutinative languages like Finnish, Hungarian, Estonian and Czech.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jurafsky, D., Martin, J.M.: Speech and Language Processing. Prentice-Hall, New Jersey (2000)

    Google Scholar 

  2. Hankamer, J.: Turkish generative morphology and morphological parsing. In: Second International Conference on Turkish Linguistics, Istanbul, Turkey (1984)

    Google Scholar 

  3. Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. In: Publications of the Department of General Linguistics, vol. 11, University of Helsinki, Helsinki (1983)

    Google Scholar 

  4. Oflazer, K.: Two Level Description of Turkish Morphology. In: Proceedings of EACL 1998, Utrecht, The Netherlands (1993)

    Google Scholar 

  5. Ekmekçioglu, F., Çuna, L., Michael, F., Willett, P.: Stemming and N-gram matching for term conflation in Turkish texts. Information Research 1(1) (1996), Available at http://informationr.net/ir/2-2/paper13.html

  6. Solak, A.: Can. F.: Effects of Stemming on Turkish Text Retrieval. Technical Report BU-CEIS-94-20. Department of Computer Engineering and Information Science, Bilkent University, Ankara (1994)

    Google Scholar 

  7. Barton, G.E.: Computational Complexity in Two-Level morphology. In: ACL Proceedings, 24th Annual Meeting (1986)

    Google Scholar 

  8. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 1st edn. Addison-Wesley, England (1999)

    Google Scholar 

  9. Lovins, J.B.: Developing of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)

    Google Scholar 

  10. Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  11. Öztaner, S.M.: A Word Grammar of Turkish with Morphophonemic Rules. M. Sc. Thesis. Department of Computer Engineering, METU, Ankara, Turkey (1996)

    Google Scholar 

  12. Crystal, D.: The Cambridge Encyclopedia of Language. Cambridge University Press, Cambridge (1987)

    Google Scholar 

  13. Lewis, G.L.: Turkish Grammar. Oxford University Press, UK (1991)

    Google Scholar 

  14. Duran, G.: Turkish Stemming Algorithm. M. Sc. Thesis. Department of Computer Engineering, Hacettepe University, Ankara (1997)

    Google Scholar 

  15. Alpkoçak, A., Kut, A., Özkarahan, E.: Bilgi Bulma Sistemleri için Otomatik Türkçe Dizinleme Yöntemi. In: Bilişim Bildirileri. Dokuz Eylül Üniversitesi, İzmir, Türkiye, pp. 247–253 (1995)

    Google Scholar 

  16. Köksal, A.: Bilgi Erişim Sorunu ve Bir Belge Dizinleme ve Erişim Dizgesi Tasarim ve Gerçeklestirimi. Docentlik Tezi. In: Fen Bilimleri Enstitüsü, Bilgisayar Bilimleri Mühendisliği Anabilim Dali, Hacettepe Üniversitesi, Ankara (1979)

    Google Scholar 

  17. Hakkani-Tiir, D.Z., Oflazer, K., Tir, G.: Statistical Morphological Disambiguation for Agglutinative Languages. In: COLLING (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dinçer, B.T., Karaoğlan, B. (2003). Stemming in Agglutinative Languages: A Probabilistic Stemmer for Turkish. In: Yazıcı, A., Şener, C. (eds) Computer and Information Sciences - ISCIS 2003. ISCIS 2003. Lecture Notes in Computer Science, vol 2869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39737-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39737-3_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20409-1

  • Online ISBN: 978-3-540-39737-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics