research-article

A boosting approach to improving pseudo-relevance feedback

Authors:
Yuanhua Lv

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
ChengXiang Zhai

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Wan Chen

Wolfram Research, Inc., Champaign, IL, USA

Wolfram Research, Inc., Champaign, IL, USA
View Profile

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalJuly 2011Pages 165–174https://doi.org/10.1145/2009916.2009942

Published:24 July 2011Publication History

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Pages 165–174

ABSTRACT

Pseudo-relevance feedback has proven effective for improving the average retrieval performance. Unfortunately, many experiments have shown that although pseudo-relevance feedback helps many queries, it also often hurts many other queries, limiting its usefulness in real retrieval applications. Thus an important, yet difficult challenge is to improve the overall effectiveness of pseudo-relevance feedback without sacrificing the performance of individual queries too much. In this paper, we propose a novel learning algorithm, FeedbackBoost, based on the boosting framework to improve pseudo-relevance feedback through optimizing the combination of a set of basis feedback algorithms using a loss function defined to directly measure both robustness and effectiveness. FeedbackBoost can potentially accommodate many basis feedback methods as features in the model, making the proposed method a general optimization framework for pseudo-relevance feedback. As an application, we apply FeedbackBoost to improve pseudo feedback based on language models through combining different document weighting strategies. The experiment results demonstrate that FeedbackBoost can achieve better average precision and meanwhile dramatically reduce the number and magnitude of feedback failures as compared to three representative pseudo feedback methods and a standard learning to rank approach for pseudo feedback.

References

N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, D. Metzler, M. D. Smucker, T. Strohman, H. Turtle, and C. Wade. Umass at trec 2004: Novelty and hard. In TREC'04, 2004.Google ScholarCross Ref
G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In ECIR'04, pages 127--137, 2004.Google ScholarCross Ref
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using smart: Trec 3. In TREC'94, pages 69--80, 1994.Google Scholar
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML'05, pages 89--96, 2005. Google ScholarDigital Library
G. Cao, J.-Y. Nie, J. Gao, and S. Robertson. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR, pages 243--250, 2008. Google ScholarDigital Library
Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of ICML, pages 129--136, 2007. Google ScholarDigital Library
K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In CIKM'09, pages 837--846, 2009. Google ScholarDigital Library
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4:933--969, 2003. Google ScholarDigital Library
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT'95, pages 23--37, London, UK, 1995. Springer-Verlag. Google ScholarDigital Library
D. Harman and C. Buckley. The nrrc reliable information access (ria) workshop. In SIGIR'04, pages 528--529, 2004. Google ScholarDigital Library
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2009.Google ScholarCross Ref
B. He and I. Ounis. Finding good feedback documents. In CIKM'09, pages 2011--2014, 2009. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM KDD 2002, pages 133--142, 2002. Google ScholarDigital Library
J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR'01, pages 111--119, 2001. Google ScholarDigital Library
V. Lavrenko and W. B. Croft. Relevance-based language models. In SIGIR'01, pages 120--127, 2001. Google ScholarDigital Library
T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, 2009. Google ScholarDigital Library
Y. Lv and C. Zhai. Adaptive relevance feedback in information retrieval. In CIKM'09, 2009. Google ScholarDigital Library
Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In Proceedings of CIKM'09, 2009. Google ScholarDigital Library
Y. Lv and C. Zhai. Positional language models for information retrieval. In SIGIR'09, pages 299--306, 2009. Google ScholarDigital Library
Y. Lv and C. Zhai. Positional relevance model for pseudo-relevance feedback. In SIGIR'10, pages 579--586, 2010. Google ScholarDigital Library
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR'98, pages 275--281, 1998. Google ScholarDigital Library
S. E. Robertson and K. S. Jones. Relevance weighting of search terms. JASIS, 27(3):129--146, 1976.Google ScholarCross Ref
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC'94, pages 109--126, 1994.Google Scholar
J. J. Rocchio. Relevance feedback in information retrieval. In In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice-Hall Inc., 1971.Google ScholarDigital Library
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. JASIS, 41(4):288--297, 1990.Google ScholarCross Ref
R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In COLT'98, pages 80--91, 1998. Google ScholarDigital Library
N. Soskin, O. Kurland, and C. Domshlak. Navigating in the dark: Modeling uncertainty in ad hoc retrieval using multiple relevance models. In ICTIR'09, pages 79--91, 2009. Google ScholarDigital Library
T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In SIGIR'06, pages 162--169, 2006. Google ScholarDigital Library
T. Tao and C. Zhai. An exploration of proximity measures in information retrieval. In SIGIR'07, pages 295--302, 2007. Google ScholarDigital Library
J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In SIGIR'07, pages 391--398, 2007. Google ScholarDigital Library
C. Zhai and J. D. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM'01, pages 403--410, 2001. Google ScholarDigital Library
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR'01, pages 334--342, 2001. Google ScholarDigital Library
Z. Zheng, K. Chen, G. Sun, and H. Zha. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR'07, pages 287--294, 2007. Google ScholarDigital Library

Index Terms

A boosting approach to improving pseudo-relevance feedback
1. Information systems
  1. Information retrieval

Recommendations

A cluster-based resampling method for pseudo-relevance feedback
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-...
Read More
Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...
Read More
An incremental approach to efficient pseudo-relevance feedback
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pseudo-relevance feedback is an important strategy to improve search accuracy. It is often implemented as a two-round retrieval process: the first round is to retrieve an initial set of documents relevant to an original query, and the second round is to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
July 2011
1374 pages
ISBN:9781450307574
DOI:10.1145/2009916
General Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Jian-Yun Nie
University of Montreal, Canada
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Tat-Seng Chua
National University of Singapore
,
W. Bruce Croft
University of Massachusetts, Amherst, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
boosting
feedbackboost
learning
loss function
optimization
pseudo-relevance feedback
robustness
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 566
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A boosting approach to improving pseudo-relevance feedback

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A cluster-based resampling method for pseudo-relevance feedback

Query dependent pseudo-relevance feedback based on wikipedia

An incremental approach to efficient pseudo-relevance feedback