Skip to main content

Performance of Turkish Information Retrieval: Evaluating the Impact of Linguistic Parameters and Compound Nouns

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

  • 1679 Accesses

Abstract

Turkish is an agglutinative language where linguistic parameters can have significant consequences on the information retrieval performances. In this paper, different Turkish linguistic parameters (truncation, stemming, stop words, etc.) have been studied and their impacts on an information retrieval system performance have been invistiguated. Three word truncations at fixed length (3, 4 and 5 characters) have been studied. The results have been compared using Snowball and Zemberek stemmers. Moreover, the results of using compound nouns, in addition to simple keywords, to index queries and documents have been studied. In the experimental part, Milliyet test collectionn have been tested by three information retrieval models. The comparisons of performance analysis have been done by he traditional information retrieval metrics and bpref metric since the test collection is build on an incomplete relevance judgments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aytac, S.: Identification of Common Molecular Subsequences. The International Information & Library Review 37(4), 275–284 (2005)

    Article  Google Scholar 

  2. Sever, H., Bitirim, Y.: FindStem: Analysis and Evaluation of a Turkish Stemming Algorithm. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 238–251. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Milliyet, http://www.milliyet.com.tr

  4. Internet World Stats, http://www.internetworldstats.com

  5. Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., Vursavas, O.M.: Information Retrieval on Turkish Texts. J. Am. Soc. Inf. Sci. Technol. 59(3), 407–421 (2008)

    Article  Google Scholar 

  6. Göknel, Y.: Turkish Grammar (Transformational Generative and Contrastive). Vivatinell Warwick, United Kingdom (2010)

    Google Scholar 

  7. Can, F.: Turkish Information Retrieval: Past Changes Future. In: 4th International Conference on Advances in Information Systems, pp. 13–22. Izmir, Turkey (2006)

    Chapter  Google Scholar 

  8. Köksal, A.: Tümüyle Özdevimli Deneysel Bir Belge Dizinleme ve Erisim Dizgesi: TÜDER. In: 3. Ulusal Bilisim Kurultayi, pp. 37–44. Ankara, Turkey (1981)

    Google Scholar 

  9. Solak, A., Can, F.: Effects of stemming on Turkish text retrieval. In: 9th Int. Symp. on Computer and Information Sciences, pp. 49–56. Antalya, Turkey (1994)

    Google Scholar 

  10. Porter, M. F.: Snowball: A language for stemming algorithms, http://snowball.tartarus.org/texts/introduction.html

  11. Akın, A. A., Akın, M. D.: Zemberek , an open source NLP framework for Turkish Languages, http://zemberek.googlecode.com

  12. Çilden, E. K.: Snowball: Stemming Turkish Words Using Snowball, http://snowball.tartarus.org/algorithms/turkish/stemmer.html

  13. Ekmekioglu, F.C., Willett, P.: Effectiveness of stemming for Turkish text retrieval. Program 34, 195–200 (2000)

    Google Scholar 

  14. Yilmazel, O.: A Language Modeling Approach to Turkish text retrieval. Journal of Science and Technology Applied Sciences and Engineering 11(2), 163–172 (2010)

    Google Scholar 

  15. Hafer, M.A., Weiss, S.F.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10, 371–385 (1974)

    Article  Google Scholar 

  16. Altintas, K., Can, F.: Stemming for Turkish: A comparative evaluation. In: 11th Turkish Symposium on Artificial Intelligence and Neural Networks, Istanbul, Turkey, pp. 181–188 (2002)

    Google Scholar 

  17. Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An Analysis of Statistical and Syntactic Phrases. In: 24th International Symposium on Computer and Information Sciences, Montreal, Canada, pp. 200–214 (1997)

    Google Scholar 

  18. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)

    Google Scholar 

  19. Fagan, J.: Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. Ph.D. thesis, Dept. of Computer Science, Cornell Univ., Ithaca, N.Y. (1987)

    Google Scholar 

  20. Fagan, J.: Automatic Phrase Indexing for Document Retrieval. In: 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 91–101. ACM Press, New York (1987)

    Google Scholar 

  21. Pickens, J., Croft, W.B.: Turkish - An exploratory analysis of phrases in text retrieval. In: Recherche d’Information Assiste par Ordinateur, Paris, France, pp. 1179–1195 (2000)

    Google Scholar 

  22. Arampatzis, A., van der Weide, T., Koster, C.H.A., van Bommel, P.: An evaluation of linguistically-motivated indexing schemes. In: 22nd BCS-IRSG Colloquium on IR Research, Cambridge, England, pp. 34–45 (2000)

    Google Scholar 

  23. Ozdemir, B., Cicekli, I.: Turkish Keyphrase Extraction Using Multi-Criterion Ranking. In: 24th International Symposium on Computer and Information Sciences, Guzelyurt, Cyprus, pp. 269–273 (2009)

    Google Scholar 

  24. Senem Kumova, M., Karaoğlan, B.: Collocation Extraction in Turkish Texts Using Statistical Methods. In: 7th International Conference on Advances in Natural Language Processing, Reykjavik, Iceland, pp. 238–249 (2010)

    Google Scholar 

  25. Arisoy, E., Roark, B., Shafran, I., Saraclar, M.: Discriminative N-gram language modeling for Turkish. In: 19th Annual Conference of the International Speech Communication Association, Brisbane, Australia, pp. 825–828 (2008)

    Google Scholar 

  26. Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., Vursavas, O.M.: First large-scale information retrieval experiments on Turkish texts. In: 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Poster paper), pp. 627–628. ACM Press, New York (2006)

    Google Scholar 

  27. Terrier IR Platform, http://terrier.org

  28. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier Information Retrieval Platform. In: 27th European Conference on IR Research, Spain, pp. 517–519 (2006)

    Google Scholar 

  29. Buckley, C., Voorhees, E.M.: Retrieval Evaluation with Incomplete Information. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32. ACM Press, New York (2004)

    Google Scholar 

  30. Turkish Stop Word List 1.1, The Natural Language Processing Group, Department of Computer Engineering, Fatih University, http://nlp.ceng.fatih.edu.tr/blog/tr/?p=31

  31. Pembe, F.C., Say, A.C.C.: A linguistically motivated information retrieval system for Turkish. In: Aykanat, C., Dayar, T., Körpeoğlu, İ. (eds.) ISCIS 2004. LNCS, vol. 3280, pp. 741–750. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Haddad, H., Bechikh Ali, C. (2014). Performance of Turkish Information Retrieval: Evaluating the Impact of Linguistic Parameters and Compound Nouns. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics