Skip to main content

Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval

  • Conference paper
Advances in Natural Language Processing (GoTAL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

Abstract

This paper presents implementations of generative management method for morphological variation of query keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that either have fair amount of morphological variation or are morphologically very rich. The paper reports implementation and evaluation of automatic procedures of variant query keyword form generation with short and long queries of CLEF collections for English, Finnish, German and Swedish. The evaluated languages show varying degrees of morphological complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sparck-Jones, K., Tait, J.I.: Automatic Search Term Variant Generation. Journal of Documentation 40, 50–66 (1984)

    Article  Google Scholar 

  2. Kettunen, K.: Reductive and Generative Approaches to Morphological Variation of Keywords in Monolingual Information Retrieval. Acta Universitatis Tamperensis 1261. University of Tampere, Tampere (2007)

    Google Scholar 

  3. Frakes, W.B.: Stemming algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval. Data Structures and Algorithms, pp. 131–160. Prentice Hall, Upper Saddle River (1992)

    Google Scholar 

  4. Kettunen, K., Airio, E.: Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval? In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 411–422. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Kettunen, K., Airio, E., Järvelin, K.: Restricted Inflectional Form Generation in Management of Morphological Keyword Variation. Information Retrieval 10, 415–444 (2007)

    Article  Google Scholar 

  6. Sormunen, E.: A Method for Measuring Wide Range Performance of Boolean Queries in Full-text Databases. Acta Universitatis Tamperensis 748. University of Tampere, Tampere (2000)

    Google Scholar 

  7. Savoy, J.: Searching Strategies for the Bulgarian Language. Information Retrieval 10, 509–529 (2007)

    Article  Google Scholar 

  8. The Lemur Toolkit for Language Modeling and Information Retrieval, http://www.lemurproject.org/

  9. Metzler, D., Croft, W.B.: Combining the Language Model and Inference Network Approaches to Retrieval. Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval 40, 735–750 (2004)

    Google Scholar 

  10. Grossman, D.A., Frieder, O.: Information Retrieval. Algorithms and Heuristics, 2nd edn. Springer, Netherlands (2004)

    Book  MATH  Google Scholar 

  11. Minnen, G., Carrol, J., Pearce, D.: Applied Morphological Processing of English. Natural Language Engineering 7, 207–223 (2001)

    Article  Google Scholar 

  12. Knutsson, O., Pargman, T.C., Eklundh, K.S., Westlund, S.: Designing and Developing a Language Environment for Second Language Writers. Computers and Education, An International Journal 49 (2001)

    Google Scholar 

  13. Brown Corpus Manual, http://khnt.hit.uib.no/icame/manuals/brown/INDEX.HTM

  14. TDT2 Multilanguage Text Version 4.0, http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2001T57

  15. Airio, E.: Word normalization and decompounding in mono- and bilingual IR. Information Retrieval 9, 249–271 (2006)

    Article  Google Scholar 

  16. Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual Document Retrieval for European Languages. Information Retrieval 7, 33–52 (2004)

    Article  Google Scholar 

  17. Snowball, http://snowball.tartarus.org/

  18. Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, New York (1999)

    Google Scholar 

  19. Robertson, S.: Salton Award Lecture. On Theoretical Argument in Information Retrieval. ACM Sigir Forum 34, 1–10 (2000)

    Article  Google Scholar 

  20. Rasmussen, E.M.: Indexing and Retrieval for the Web. In: Cronin, B. (ed.) Annual Review of Information Science and Technology, vol. 37, pp. 91–124 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kettunen, K. (2008). Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85287-2_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85286-5

  • Online ISBN: 978-3-540-85287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics