Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval

Kettunen, Kimmo

doi:10.1007/978-3-540-85287-2_22

Kimmo Kettunen²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

International Conference on Natural Language Processing

1445 Accesses
5 Citations

Abstract

This paper presents implementations of generative management method for morphological variation of query keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that either have fair amount of morphological variation or are morphologically very rich. The paper reports implementation and evaluation of automatic procedures of variant query keyword form generation with short and long queries of CLEF collections for English, Finnish, German and Swedish. The evaluated languages show varying degrees of morphological complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sparck-Jones, K., Tait, J.I.: Automatic Search Term Variant Generation. Journal of Documentation 40, 50–66 (1984)
Article Google Scholar
Kettunen, K.: Reductive and Generative Approaches to Morphological Variation of Keywords in Monolingual Information Retrieval. Acta Universitatis Tamperensis 1261. University of Tampere, Tampere (2007)
Google Scholar
Frakes, W.B.: Stemming algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval. Data Structures and Algorithms, pp. 131–160. Prentice Hall, Upper Saddle River (1992)
Google Scholar
Kettunen, K., Airio, E.: Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval? In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 411–422. Springer, Heidelberg (2006)
Chapter Google Scholar
Kettunen, K., Airio, E., Järvelin, K.: Restricted Inflectional Form Generation in Management of Morphological Keyword Variation. Information Retrieval 10, 415–444 (2007)
Article Google Scholar
Sormunen, E.: A Method for Measuring Wide Range Performance of Boolean Queries in Full-text Databases. Acta Universitatis Tamperensis 748. University of Tampere, Tampere (2000)
Google Scholar
Savoy, J.: Searching Strategies for the Bulgarian Language. Information Retrieval 10, 509–529 (2007)
Article Google Scholar
The Lemur Toolkit for Language Modeling and Information Retrieval, http://www.lemurproject.org/
Metzler, D., Croft, W.B.: Combining the Language Model and Inference Network Approaches to Retrieval. Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval 40, 735–750 (2004)
Google Scholar
Grossman, D.A., Frieder, O.: Information Retrieval. Algorithms and Heuristics, 2nd edn. Springer, Netherlands (2004)
Book MATH Google Scholar
Minnen, G., Carrol, J., Pearce, D.: Applied Morphological Processing of English. Natural Language Engineering 7, 207–223 (2001)
Article Google Scholar
Knutsson, O., Pargman, T.C., Eklundh, K.S., Westlund, S.: Designing and Developing a Language Environment for Second Language Writers. Computers and Education, An International Journal 49 (2001)
Google Scholar
Brown Corpus Manual, http://khnt.hit.uib.no/icame/manuals/brown/INDEX.HTM
TDT2 Multilanguage Text Version 4.0, http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2001T57
Airio, E.: Word normalization and decompounding in mono- and bilingual IR. Information Retrieval 9, 249–271 (2006)
Article Google Scholar
Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual Document Retrieval for European Languages. Information Retrieval 7, 33–52 (2004)
Article Google Scholar
Snowball, http://snowball.tartarus.org/
Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, New York (1999)
Google Scholar
Robertson, S.: Salton Award Lecture. On Theoretical Argument in Information Retrieval. ACM Sigir Forum 34, 1–10 (2000)
Article Google Scholar
Rasmussen, E.M.: Indexing and Retrieval for the Web. In: Cronin, B. (ed.) Annual Review of Information Science and Technology, vol. 37, pp. 91–124 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Studies, University of Tampere, Kanslerinrinne 1, FIN-33014, Tampere, Finland
Kimmo Kettunen

Authors

Kimmo Kettunen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 41296, Göteborg, Sweden
Bengt Nordström & Aarne Ranta &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kettunen, K. (2008). Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-85287-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics