skip to main content
10.1145/1277741.1277795acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Estimation and use of uncertainty in pseudo-relevance feedback

Published: 23 July 2007 Publication History

Abstract

Existing pseudo-relevance feedback methods typically perform averaging over the top-retrieved documents, but ignore an important statistical dimension: the risk or variance associated with either the individual document models, or their combination. Treating the baseline feedback method as a black box, and the output feedback model as a random variable, we estimate a posterior distribution for the feed-back model by resampling a given query's top-retrieved documents, using the posterior mean or mode as the enhanced feedback model. We then perform model combination over several enhanced models, each based on a slightly modified query sampled from the original query. We find that resampling documents helps increase individual feedback model precision by removing noise terms, while sampling from the query improves robustness (worst-case performance) by emphasizing terms related to multiple query aspects. The result is a meta-feedback algorithm that is both more robust and more precise than the original strong baseline method.

References

[1]
The Lemur toolkit for language modeling and retrieval. http://www.lemurproject.org.
[2]
G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In Proc. of the 25th European Conf. on Information Retrieval (ECIR 2004), pages 127--137.
[3]
R. K. Ando and T. Zhang. A high-performance semi-supervised learning method for text chunking. In Proc. of the 43rd Annual Meeting of the ACL, pages 1--9, June 2005.
[4]
L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.
[5]
C. Carpineto, G. Romano, and V. Giannini. Improving retrieval feedback with multiple term-ranking function combination. ACM Trans. Info. Systems, 20(3):259--290.
[6]
K. Collins-Thompson, P. Ogilvie, and J. Callan. Initial results with structured queries and language models on half a terabyte of text. In Proc. of 2005 Text REtrieval Conference. NIST Special Publication.
[7]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley and Sons, 2nd edition, 2001.
[8]
W. R. Greiff, W. T. Morgan, and J. M. Ponte. The role of variance in term weighting for probabilistic information retrieval. In Proc. of the 11th Intl. Conf. on Info. and Knowledge Mgmt. (CIKM 2002), pages 252--259.
[9]
T. Kohonen, J. Hynninen, J. Kangas, and J. Laaksonen. SOMPAK: The self-organizing map program package. Technical Report A31, Helsinki University of Technology, 1996. http://www.cis.hut.fi/research/papers/som tr96.ps.Z.
[10]
V. Lavrenko. A Generative Theory of Relevance. PhD thesis, University of Massachusetts, Amherst, 2004.
[11]
C.-H. Lee, R. Greiner, and S. Wang. Using query-specific variance estimates to combine Bayesian classifiers. In Proc. of the 23rd Intl. Conf. on Machine Learning (ICML 2006), pages 529--536.
[12]
D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Info. Processing and Mgmt., 40(5):735--750, 2004.
[13]
T. Minka. Estimating a Dirichlet distribution. Technical report, 2000. http://research.microsoft.com/minka/papers/dirichlet.
[14]
J. Ponte. Advances in Information Retrieval, chapter Language models for relevance feedback, pages 73--96. 2000. W. B. Croft, ed.
[15]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proc. of the 1998 ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275--281.
[16]
J. Rocchio. The SMART Retrieval System, chapter Relevance Feedback in Information Retrieval, pages 313--323. Prentice-Hall, 1971. G. Salton, ed.
[17]
T. Sakai, T. Manabe, and M. Koyama. Flexible pseudo-relevance feedback via selective sampling. ACM Transactions on Asian Language Information Processing (TALIP), 4(2):111--135, 2005.
[18]
T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proc. of the 2006 ACM SIGIR Conference on Research and Development in Information Retrieval, pages 162--169.
[19]
J. Xu and W. B. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst., 18(1):79--112, 2000.
[20]
E. YomTov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty. In Proc. of the 2005 ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 512--519.
[21]
Y. Zhou and W. B. Croft. Ranking robustness: a novel framework to predict query performance. In Proc. of the 15th ACM Intl. Conf. on Information and Knowledge Mgmt. (CIKM 2006), pages 567--574.

Cited By

View all
  • (2025)Knowledge graph based entity selection framework for ad-hoc retrievalWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2024.10084884:COnline publication date: 18-Feb-2025
  • (2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
  • (2023)Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random SelectionProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625333(139-149)Online publication date: 26-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. pseudo-relevance feedback
  2. query expansion

Qualifiers

  • Article

Conference

SIGIR07
Sponsor:
SIGIR07: The 30th Annual International SIGIR Conference
July 23 - 27, 2007
Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Knowledge graph based entity selection framework for ad-hoc retrievalWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2024.10084884:COnline publication date: 18-Feb-2025
  • (2023)Personalized Query Expansion with Contextual Word EmbeddingsACM Transactions on Information Systems10.1145/362498842:2(1-35)Online publication date: 20-Sep-2023
  • (2023)Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random SelectionProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625333(139-149)Online publication date: 26-Nov-2023
  • (2023)Selecting which Dense Retriever to use for Zero-Shot SearchProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625330(223-233)Online publication date: 26-Nov-2023
  • (2023)Multivariate Representation Learning for Information RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591740(163-173)Online publication date: 19-Jul-2023
  • (2022)LoLProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532017(825-836)Online publication date: 6-Jul-2022
  • (2022)Neural Network Guided Fast and Efficient Query-Based Stemming by Predicting Term Co-occurrence StatisticsSN Computer Science10.1007/s42979-022-01081-53:3Online publication date: 24-Mar-2022
  • (2021)NewsLink: Empowering Intuitive News Search with Knowledge Graphs2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00081(876-887)Online publication date: Apr-2021
  • (2021)PGT: Pseudo Relevance Feedback Using a Graph-Based TransformerAdvances in Information Retrieval10.1007/978-3-030-72240-1_46(440-447)Online publication date: 30-Mar-2021
  • (2019)Relevance FeedbackACM Transactions on Information Systems10.1145/336048737:4(1-28)Online publication date: 4-Oct-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media