Abstract
Turkish is an agglutinative language where linguistic parameters can have significant consequences on the information retrieval performances. In this paper, different Turkish linguistic parameters (truncation, stemming, stop words, etc.) have been studied and their impacts on an information retrieval system performance have been invistiguated. Three word truncations at fixed length (3, 4 and 5 characters) have been studied. The results have been compared using Snowball and Zemberek stemmers. Moreover, the results of using compound nouns, in addition to simple keywords, to index queries and documents have been studied. In the experimental part, Milliyet test collectionn have been tested by three information retrieval models. The comparisons of performance analysis have been done by he traditional information retrieval metrics and bpref metric since the test collection is build on an incomplete relevance judgments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aytac, S.: Identification of Common Molecular Subsequences. The International Information & Library Review 37(4), 275–284 (2005)
Sever, H., Bitirim, Y.: FindStem: Analysis and Evaluation of a Turkish Stemming Algorithm. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 238–251. Springer, Heidelberg (2003)
Milliyet, http://www.milliyet.com.tr
Internet World Stats, http://www.internetworldstats.com
Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., Vursavas, O.M.: Information Retrieval on Turkish Texts. J. Am. Soc. Inf. Sci. Technol. 59(3), 407–421 (2008)
Göknel, Y.: Turkish Grammar (Transformational Generative and Contrastive). Vivatinell Warwick, United Kingdom (2010)
Can, F.: Turkish Information Retrieval: Past Changes Future. In: 4th International Conference on Advances in Information Systems, pp. 13–22. Izmir, Turkey (2006)
Köksal, A.: Tümüyle Özdevimli Deneysel Bir Belge Dizinleme ve Erisim Dizgesi: TÜDER. In: 3. Ulusal Bilisim Kurultayi, pp. 37–44. Ankara, Turkey (1981)
Solak, A., Can, F.: Effects of stemming on Turkish text retrieval. In: 9th Int. Symp. on Computer and Information Sciences, pp. 49–56. Antalya, Turkey (1994)
Porter, M. F.: Snowball: A language for stemming algorithms, http://snowball.tartarus.org/texts/introduction.html
Akın, A. A., Akın, M. D.: Zemberek , an open source NLP framework for Turkish Languages, http://zemberek.googlecode.com
Çilden, E. K.: Snowball: Stemming Turkish Words Using Snowball, http://snowball.tartarus.org/algorithms/turkish/stemmer.html
Ekmekioglu, F.C., Willett, P.: Effectiveness of stemming for Turkish text retrieval. Program 34, 195–200 (2000)
Yilmazel, O.: A Language Modeling Approach to Turkish text retrieval. Journal of Science and Technology Applied Sciences and Engineering 11(2), 163–172 (2010)
Hafer, M.A., Weiss, S.F.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10, 371–385 (1974)
Altintas, K., Can, F.: Stemming for Turkish: A comparative evaluation. In: 11th Turkish Symposium on Artificial Intelligence and Neural Networks, Istanbul, Turkey, pp. 181–188 (2002)
Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An Analysis of Statistical and Syntactic Phrases. In: 24th International Symposium on Computer and Information Sciences, Montreal, Canada, pp. 200–214 (1997)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
Fagan, J.: Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. Ph.D. thesis, Dept. of Computer Science, Cornell Univ., Ithaca, N.Y. (1987)
Fagan, J.: Automatic Phrase Indexing for Document Retrieval. In: 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 91–101. ACM Press, New York (1987)
Pickens, J., Croft, W.B.: Turkish - An exploratory analysis of phrases in text retrieval. In: Recherche d’Information Assiste par Ordinateur, Paris, France, pp. 1179–1195 (2000)
Arampatzis, A., van der Weide, T., Koster, C.H.A., van Bommel, P.: An evaluation of linguistically-motivated indexing schemes. In: 22nd BCS-IRSG Colloquium on IR Research, Cambridge, England, pp. 34–45 (2000)
Ozdemir, B., Cicekli, I.: Turkish Keyphrase Extraction Using Multi-Criterion Ranking. In: 24th International Symposium on Computer and Information Sciences, Guzelyurt, Cyprus, pp. 269–273 (2009)
Senem Kumova, M., Karaoğlan, B.: Collocation Extraction in Turkish Texts Using Statistical Methods. In: 7th International Conference on Advances in Natural Language Processing, Reykjavik, Iceland, pp. 238–249 (2010)
Arisoy, E., Roark, B., Shafran, I., Saraclar, M.: Discriminative N-gram language modeling for Turkish. In: 19th Annual Conference of the International Speech Communication Association, Brisbane, Australia, pp. 825–828 (2008)
Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., Vursavas, O.M.: First large-scale information retrieval experiments on Turkish texts. In: 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Poster paper), pp. 627–628. ACM Press, New York (2006)
Terrier IR Platform, http://terrier.org
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier Information Retrieval Platform. In: 27th European Conference on IR Research, Spain, pp. 517–519 (2006)
Buckley, C., Voorhees, E.M.: Retrieval Evaluation with Incomplete Information. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32. ACM Press, New York (2004)
Turkish Stop Word List 1.1, The Natural Language Processing Group, Department of Computer Engineering, Fatih University, http://nlp.ceng.fatih.edu.tr/blog/tr/?p=31
Pembe, F.C., Say, A.C.C.: A linguistically motivated information retrieval system for Turkish. In: Aykanat, C., Dayar, T., Körpeoğlu, İ. (eds.) ISCIS 2004. LNCS, vol. 3280, pp. 741–750. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Haddad, H., Bechikh Ali, C. (2014). Performance of Turkish Information Retrieval: Evaluating the Impact of Linguistic Parameters and Compound Nouns. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)