ABSTRACT
Pseudo-relevance feedback has proven effective for improving the average retrieval performance. Unfortunately, many experiments have shown that although pseudo-relevance feedback helps many queries, it also often hurts many other queries, limiting its usefulness in real retrieval applications. Thus an important, yet difficult challenge is to improve the overall effectiveness of pseudo-relevance feedback without sacrificing the performance of individual queries too much. In this paper, we propose a novel learning algorithm, FeedbackBoost, based on the boosting framework to improve pseudo-relevance feedback through optimizing the combination of a set of basis feedback algorithms using a loss function defined to directly measure both robustness and effectiveness. FeedbackBoost can potentially accommodate many basis feedback methods as features in the model, making the proposed method a general optimization framework for pseudo-relevance feedback. As an application, we apply FeedbackBoost to improve pseudo feedback based on language models through combining different document weighting strategies. The experiment results demonstrate that FeedbackBoost can achieve better average precision and meanwhile dramatically reduce the number and magnitude of feedback failures as compared to three representative pseudo feedback methods and a standard learning to rank approach for pseudo feedback.
- N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, D. Metzler, M. D. Smucker, T. Strohman, H. Turtle, and C. Wade. Umass at trec 2004: Novelty and hard. In TREC'04, 2004.Google ScholarCross Ref
- G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In ECIR'04, pages 127--137, 2004.Google ScholarCross Ref
- C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using smart: Trec 3. In TREC'94, pages 69--80, 1994.Google Scholar
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML'05, pages 89--96, 2005. Google ScholarDigital Library
- G. Cao, J.-Y. Nie, J. Gao, and S. Robertson. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR, pages 243--250, 2008. Google ScholarDigital Library
- Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of ICML, pages 129--136, 2007. Google ScholarDigital Library
- K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In CIKM'09, pages 837--846, 2009. Google ScholarDigital Library
- Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4:933--969, 2003. Google ScholarDigital Library
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT'95, pages 23--37, London, UK, 1995. Springer-Verlag. Google ScholarDigital Library
- D. Harman and C. Buckley. The nrrc reliable information access (ria) workshop. In SIGIR'04, pages 528--529, 2004. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2009.Google ScholarCross Ref
- B. He and I. Ounis. Finding good feedback documents. In CIKM'09, pages 2011--2014, 2009. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM KDD 2002, pages 133--142, 2002. Google ScholarDigital Library
- J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR'01, pages 111--119, 2001. Google ScholarDigital Library
- V. Lavrenko and W. B. Croft. Relevance-based language models. In SIGIR'01, pages 120--127, 2001. Google ScholarDigital Library
- T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, 2009. Google ScholarDigital Library
- Y. Lv and C. Zhai. Adaptive relevance feedback in information retrieval. In CIKM'09, 2009. Google ScholarDigital Library
- Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In Proceedings of CIKM'09, 2009. Google ScholarDigital Library
- Y. Lv and C. Zhai. Positional language models for information retrieval. In SIGIR'09, pages 299--306, 2009. Google ScholarDigital Library
- Y. Lv and C. Zhai. Positional relevance model for pseudo-relevance feedback. In SIGIR'10, pages 579--586, 2010. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR'98, pages 275--281, 1998. Google ScholarDigital Library
- S. E. Robertson and K. S. Jones. Relevance weighting of search terms. JASIS, 27(3):129--146, 1976.Google ScholarCross Ref
- S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC'94, pages 109--126, 1994.Google Scholar
- J. J. Rocchio. Relevance feedback in information retrieval. In In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice-Hall Inc., 1971.Google ScholarDigital Library
- G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. JASIS, 41(4):288--297, 1990.Google ScholarCross Ref
- R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In COLT'98, pages 80--91, 1998. Google ScholarDigital Library
- N. Soskin, O. Kurland, and C. Domshlak. Navigating in the dark: Modeling uncertainty in ad hoc retrieval using multiple relevance models. In ICTIR'09, pages 79--91, 2009. Google ScholarDigital Library
- T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In SIGIR'06, pages 162--169, 2006. Google ScholarDigital Library
- T. Tao and C. Zhai. An exploration of proximity measures in information retrieval. In SIGIR'07, pages 295--302, 2007. Google ScholarDigital Library
- J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In SIGIR'07, pages 391--398, 2007. Google ScholarDigital Library
- C. Zhai and J. D. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM'01, pages 403--410, 2001. Google ScholarDigital Library
- C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR'01, pages 334--342, 2001. Google ScholarDigital Library
- Z. Zheng, K. Chen, G. Sun, and H. Zha. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR'07, pages 287--294, 2007. Google ScholarDigital Library
Index Terms
- A boosting approach to improving pseudo-relevance feedback
Recommendations
A cluster-based resampling method for pseudo-relevance feedback
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalTypical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-...
Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalPseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...
An incremental approach to efficient pseudo-relevance feedback
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalPseudo-relevance feedback is an important strategy to improve search accuracy. It is often implemented as a two-round retrieval process: the first round is to retrieve an initial set of documents relevant to an original query, and the second round is to ...
Comments