Abstract
This year, the SICS team has concentrated on query processing and on the internal topical structure of the query, specifically compound translation. Compound translation is non-trivial due to dependencies between compound elements. This year, we have investigated topical dependencies between query terms: if a query term happens to be non-topical or noise, it should be discarded or given a low weight when ranking retrieved documents; if a query term shows high topicality its weight should be boosted. The two experiments described here are based on the analysis of the distributional character of query terms: one using similarity of occurrence context between query terms globally across the entire collection; the other using the likelihood of individual terms to appear topically in individual texts. Both – complementary – boosting schemes tested delivered improved results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cöster, R., Sahlgren, M., Karlgren, J.: Selective Compound Splitting of Swedish Queries for Boolean Combinations of Truncated Terms. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 337–344. Springer, Heidelberg (2004)
Ahlgren, P.: The Effects of Indexing Strategy-Query Term Combination on Retrieval Effectiveness in a Swedish Full Text Database. PhD thesis, Department of Library and Information Science, University College of Borås, Borås, Sweden (2004)
Dalianis, H.: Improving search engine retrieval using a compound splitter for swedish. In: Proceedings of the 15th Nordic Conference of Computational Linguistics, Joensuu, Finland, University of Joensuu (2005)
Hedlund, T.: Dictionary-Based Cross-Language Information Retrieval: Principles, System Design and Evaluation. PhD thesis, Department of Information Science, University of Tampere, Tampere, Finland (2003)
Karlgren, J.: Compound terms and their constituent elements in information retrieval. In: Proceedings of the 15th Nordic Conference of Computational Linguistics, Joensuu, Finland, University of Joensuu (2005)
Braschler, M., Ripplinger, B.: How effective is stemming and decompounding for german text retrieval? Information Retrieval 7, 291–306 (2004)
Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of the 5th Conference on Applied Natural Language Processing, Association for Computational Linguistics, pp. 64–71 (1997)
Sahlgren, M., Karlgren, J., Cöster, R., Järvinen, T.: SICS at CLEF 2002: Automatic query expansion using random indexing. In: The CLEF 2002 Workshop (2002)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th International Conference on Research and Development in Information Retrieval, Zürich, Switzerland, pp. 21–29. ACM SIGIR, New York (1996)
Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036. Erlbaum, Mahwah (2000)
Karlgren, J., Sahlgren, M.: From words to understanding. In: Uesaka, Y., Kanerva, P., Asoh, H. (eds.) Foundations of Real-World Intelligence, CSLI Publications, Stanford (2001)
Katz, S.: Distribution of content words and phrases in text and language modelling. Natural Language Engineering 2, 15–60 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karlgren, J., Sahlgren, M., Cöster, R. (2006). Weighting Query Terms Based on Distributional Statistics. In: Peters, C., et al. Accessing Multilingual Information Repositories. CLEF 2005. Lecture Notes in Computer Science, vol 4022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11878773_24
Download citation
DOI: https://doi.org/10.1007/11878773_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45697-1
Online ISBN: 978-3-540-45700-8
eBook Packages: Computer ScienceComputer Science (R0)