Abstract
It has been observed that short queries generally have better performance than their corresponding long versions when retrieved by the same IR model. This is mainly because most of the current models do not distinguish the importance of different terms in the query. Observed that sentence-like queries encode information related to the term importance in the grammatical structure, we propose a Hidden Markov Model (HMM) based method to extract such information to do term weighting. The basic idea of choosing HMM is motivated by its successful application in capturing the relationship between adjacent terms in NLP field. Since we are dealing with queries of natural language form, we think that HMM can also be used to capture the dependence between the weights and the grammatical structures. Our experiments show that our assumption is quite reasonable and that such information, when utilized properly, can greatly improve retrieval performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)
Kumaran, G., Allan, J.: A Case for Shorter Queries and Helping Users Create Them. In: Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Rochester, pp. 220–227 (2007)
Kumaran, G., Allan, J.: Effective and Efficient User Interaction for Long Queries. In: 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–18. ACM Press, Singapore (2008)
Bendersky, M., Croft, W.B.: Discovering Key Concepts in Verbose Queries. In: 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 491–498. ACM Press, Singapore (2008)
Cao, G., Nie, J., Gao, J., Robertson, S.: Selecting Good Expansion Terms for Pseudo-Relevance Feedback. In: 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 243–250. ACM Press, Singapore (2008)
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Metzler, D., Croft, W.B.: Combining the Language Model and Inference Network Approaches to Retrieval. Information Processing and Management 40(5), 735–750 (2004)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, pp. 252–259 (2003)
Metzler, D., Strohman, T., Zhou, Y., Croft, W.B.: Indri at TREC 2005: Terabyte Track. In: 14th Text Retrieval Conference, Gaithersburg, pp. 175–180 (2005)
Jones, K.S., Walker, S., Robertson, S.E.: A Probabilistic Model of Information Retrieval: Development and Comparative Experiments. Information Processing and Management 36(6), 779–840 (2000)
Croft, W.B.: Combining Approaches to Information Retrieval. In: Croft, W.B. (ed.) Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, pp. 1–36. Kluwer Academic Publishers (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yan, X., Gao, G., Su, X., Wei, H., Zhang, X., Lu, Q. (2012). Hidden Markov Model for Term Weighting in Verbose Queries. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds) Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics. CLEF 2012. Lecture Notes in Computer Science, vol 7488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33247-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-33247-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33246-3
Online ISBN: 978-3-642-33247-0
eBook Packages: Computer ScienceComputer Science (R0)