Article

Estimation and use of uncertainty in pseudo-relevance feedback

Authors:

Kevyn Collins-Thompson,

Jamie CallanAuthors Info & Claims

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 303 - 310

https://doi.org/10.1145/1277741.1277795

Published: 23 July 2007 Publication History

Abstract

Existing pseudo-relevance feedback methods typically perform averaging over the top-retrieved documents, but ignore an important statistical dimension: the risk or variance associated with either the individual document models, or their combination. Treating the baseline feedback method as a black box, and the output feedback model as a random variable, we estimate a posterior distribution for the feed-back model by resampling a given query's top-retrieved documents, using the posterior mean or mode as the enhanced feedback model. We then perform model combination over several enhanced models, each based on a slightly modified query sampled from the original query. We find that resampling documents helps increase individual feedback model precision by removing noise terms, while sampling from the query improves robustness (worst-case performance) by emphasizing terms related to multiple query aspects. The result is a meta-feedback algorithm that is both more robust and more precise than the original strong baseline method.

References

[1]

The Lemur toolkit for language modeling and retrieval. http://www.lemurproject.org.

[2]

G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In Proc. of the 25th European Conf. on Information Retrieval (ECIR 2004), pages 127--137.

[3]

R. K. Ando and T. Zhang. A high-performance semi-supervised learning method for text chunking. In Proc. of the 43rd Annual Meeting of the ACL, pages 1--9, June 2005.

Digital Library

[4]

L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.

[5]

C. Carpineto, G. Romano, and V. Giannini. Improving retrieval feedback with multiple term-ranking function combination. ACM Trans. Info. Systems, 20(3):259--290.

Digital Library

[6]

K. Collins-Thompson, P. Ogilvie, and J. Callan. Initial results with structured queries and language models on half a terabyte of text. In Proc. of 2005 Text REtrieval Conference. NIST Special Publication.

[7]

R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley and Sons, 2nd edition, 2001.

Digital Library

[8]

W. R. Greiff, W. T. Morgan, and J. M. Ponte. The role of variance in term weighting for probabilistic information retrieval. In Proc. of the 11th Intl. Conf. on Info. and Knowledge Mgmt. (CIKM 2002), pages 252--259.

Digital Library

[9]

T. Kohonen, J. Hynninen, J. Kangas, and J. Laaksonen. SOMPAK: The self-organizing map program package. Technical Report A31, Helsinki University of Technology, 1996. http://www.cis.hut.fi/research/papers/som tr96.ps.Z.

[10]

V. Lavrenko. A Generative Theory of Relevance. PhD thesis, University of Massachusetts, Amherst, 2004.

Digital Library

[11]

C.-H. Lee, R. Greiner, and S. Wang. Using query-specific variance estimates to combine Bayesian classifiers. In Proc. of the 23rd Intl. Conf. on Machine Learning (ICML 2006), pages 529--536.

Digital Library

[12]

D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Info. Processing and Mgmt., 40(5):735--750, 2004.

Digital Library

[13]

T. Minka. Estimating a Dirichlet distribution. Technical report, 2000. http://research.microsoft.com/minka/papers/dirichlet.

[14]

J. Ponte. Advances in Information Retrieval, chapter Language models for relevance feedback, pages 73--96. 2000. W. B. Croft, ed.

[15]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proc. of the 1998 ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275--281.

Digital Library

[16]

J. Rocchio. The SMART Retrieval System, chapter Relevance Feedback in Information Retrieval, pages 313--323. Prentice-Hall, 1971. G. Salton, ed.

[17]

T. Sakai, T. Manabe, and M. Koyama. Flexible pseudo-relevance feedback via selective sampling. ACM Transactions on Asian Language Information Processing (TALIP), 4(2):111--135, 2005.

Digital Library

[18]

T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proc. of the 2006 ACM SIGIR Conference on Research and Development in Information Retrieval, pages 162--169.

Digital Library

[19]

J. Xu and W. B. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst., 18(1):79--112, 2000.

Digital Library

[20]

E. YomTov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty. In Proc. of the 2005 ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 512--519.

Digital Library

[21]

Y. Zhou and W. B. Croft. Ranking robustness: a novel framework to predict query performance. In Proc. of the 15th ACM Intl. Conf. on Information and Knowledge Mgmt. (CIKM 2006), pages 567--574.

Digital Library

Cited By

Singh PBhowmick P(2025)Knowledge graph based entity selection framework for ad-hoc retrievalWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2024.10084884:COnline publication date: 18-Feb-2025
https://dl.acm.org/doi/10.1016/j.websem.2024.100848
Bassani ETonellotto NPasi G(2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1145/3624988
Althammer SZuccon GHofstätter SVerberne SHanbury A(2023)Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random SelectionProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625333(139-149)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625333
Show More Cited By

Index Terms

Estimation and use of uncertainty in pseudo-relevance feedback
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment
    2. Retrieval models and ranking

Recommendations

Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval
ICTIR '21: Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users' initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-...
A cluster-based resampling method for pseudo-relevance feedback
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-...
Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

July 2007

946 pages

ISBN:9781595935977

DOI:10.1145/1277741

General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR07

Sponsor:

SIGIR07: The 30th Annual International SIGIR Conference

July 23 - 27, 2007

Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

76
Total Citations
View Citations
876
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singh PBhowmick P(2025)Knowledge graph based entity selection framework for ad-hoc retrievalWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2024.10084884:COnline publication date: 18-Feb-2025
https://dl.acm.org/doi/10.1016/j.websem.2024.100848
Bassani ETonellotto NPasi G(2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1145/3624988
Althammer SZuccon GHofstätter SVerberne SHanbury A(2023)Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random SelectionProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625333(139-149)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625333
Khramtsova EZhuang SBaktashmotlagh MWang XZuccon G(2023)Selecting which Dense Retriever to use for Zero-Shot SearchProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625330(223-233)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625330
Zamani HBendersky MChen HDuh WHuang HKato MMothe JPoblete B(2023)Multivariate Representation Learning for Information RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591740(163-173)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591740
Zhu YPang LLan YShen HCheng XAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)LoLProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532017(825-836)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3532017
Singh PBhowmick P(2022)Neural Network Guided Fast and Efficient Query-Based Stemming by Predicting Term Co-occurrence StatisticsSN Computer Science10.1007/s42979-022-01081-53:3Online publication date: 24-Mar-2022
https://doi.org/10.1007/s42979-022-01081-5
Yang YLi YTung A(2021)NewsLink: Empowering Intuitive News Search with Knowledge Graphs2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00081(876-887)Online publication date: Apr-2021
https://doi.org/10.1109/ICDE51399.2021.00081
Yu HDai ZCallan J(2021)PGT: Pseudo Relevance Feedback Using a Graph-Based TransformerAdvances in Information Retrieval10.1007/978-3-030-72240-1_46(440-447)Online publication date: 30-Mar-2021
https://doi.org/10.1007/978-3-030-72240-1_46
Raiber FKurland O(2019)Relevance FeedbackACM Transactions on Information Systems10.1145/336048737:4(1-28)Online publication date: 4-Oct-2019
https://dl.acm.org/doi/10.1145/3360487
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten