research-article

Back to the roots: a probabilistic framework for query-performance prediction

Authors:

Ofri RomAuthors Info & Claims

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 823 - 832

https://doi.org/10.1145/2396761.2396866

Published: 29 October 2012 Publication History

Abstract

The query-performance prediction task is estimating the effectiveness of a search performed in response to a query when no relevance judgments are available. Although there exist many effective prediction methods, these differ substantially in their basic principles, and rely on diverse hypotheses about the characteristics of effective retrieval. We present a novel fundamental probabilistic prediction framework. Using the framework, we derive and explain various previously proposed prediction methods that might seem completely different, but turn out to share the same formal basis. The derivations provide new perspectives on several predictors (e.g., Clarity). The framework is also used to devise new prediction approaches that outperform the state-of-the-art.

References

[1]

G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In Proc. of ECIR, pages 127--137, 2004.

[2]

J. A. Aslam and V. Pavlu. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Proc. of ECIR, pages 198--209, 2007.

Digital Library

[3]

M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In Proc. of WSDM, pages 95--104, 2011.

Digital Library

[4]

D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, 2010.

Digital Library

[5]

D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In Proc. of SIGIR, pages 390--397, 2006.

Digital Library

[6]

C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web track. In Proc. of TREC, 2009.

[7]

K. Collins-Thompson and P. N. Bennett. Predicting query performance via classification. In Proc. of ECIR, pages 140--152, 2010.

Digital Library

[8]

G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. CoRR, abs/1004.5168, 2010.

[9]

W. B. Croft and J. Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003.

Digital Library

[10]

S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proc. of SIGIR, pages 299--306, 2002.

Digital Library

[11]

S. Cronen-Townsend, Y. Zhou, and W. B. Croft. A language modeling framework for selective query expansion. Technical Report IR-338, Center for Intelligent Information Retrieval, University of Massachusetts, 2004.

[12]

R. Cummins. Predicting query performance directly from score distributions. In Proc. of AIRS, pages 315--326, 2011.

Digital Library

[13]

R. Cummins, J. M. Jose, and C. O'Riordan. Improved query performance prediction using standard deviation. In Proc. of SIGIR, pages 1089--1090, 2011.

Digital Library

[14]

F. Diaz. Performance prediction using spatial autocorrelation. In Proc. of SIGIR, pages 583--590, 2007.

Digital Library

[15]

C. Hauff, L. Azzopardi, and D. Hiemstra. The combination and evaluation of query performance prediction methods. In Proc. of ECIR, pages 301--312, 2009.

Digital Library

[16]

C. Hauff, D. Hiemstra, and F. de Jong. A survey of pre-retrieval query performance predictors. In Proc. of CIKM, pages 1419--1420, 2008.

Digital Library

[17]

C. Hauff, D. Kelly, and L. Azzopardi. A comparison of user and system query performance predictions. In Proc. of CIKM, pages 979--988, 2010.

Digital Library

[18]

C. Hauff, V. Murdock, and R. Baeza-Yates. Improved query difficulty prediction for the web. In Proc. of CIKM, pages 439--448, 2008.

Digital Library

[19]

B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In Proc. of SPIRE, pages 43--54, 2004.

[20]

S. Hummel, A. Shtok, F. Raiber, O. Kurland, and D. Carmel. Clarity re-visited. In Proc. of SIGIR, 2012. Poster.

Digital Library

[21]

O. Kurland, A. Shtok, D. Carmel, and S. Hummel. A unified framework for post-retrieval query-performance prediction. In Proc. of ICTIR, pages 15--26, 2011.

Digital Library

[22]

J. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In Croft and Lafferty {9}, pages 1--10.

[23]

J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001.

Digital Library

[24]

V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001.

Digital Library

[25]

J. Mothe and L. Tanguy. Linguistic features to predict query difficulty. In ACM SIGIR 2005 Workshop on Predicting Query Difficulty - Methods and Applications, 2005.

[26]

S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, pages 294--304, 1977.

[27]

T. Rölleke and J. Wang. A parallel derivation of probabilistic information retrieval models. In SIGIR, pages 107--114, 2006.

Digital Library

[28]

F. Scholer and S. Garcia. A case for improved evaluation of query difficulty prediction. In Proc. of SIGIR, pages 640--641, 2009.

Digital Library

[29]

F. Scholer, H. E. Williams, and A. Turpin. Query association surrogates for web search. JASIST, 55(7):637--650, 2004.

Digital Library

[30]

A. Shtok, O. Kurland, and D. Carmel. Predicting query performance by query-drift estimation. In Proc. of ICTIR, pages 305--312, 2009.

Digital Library

[31]

A. Shtok, O. Kurland, and D. Carmel. Using statistical decision theory and relevance models for query-performance prediction. In Proccedings of SIGIR, pages 259--266, 2010.

Digital Library

[32]

F. Song and W. B. Croft. A general language model for information retrieval (poster abstract). In Proc. of SIGIR, pages 279--280, 1999.

Digital Library

[33]

K. Sparck Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: development and comparative experiments - part 1. Information Processing and Management, 36(6):779--808, 2000.

Digital Library

[34]

J. H. Steiger. Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87(2):245--251, 1980.

[35]

S. Tomlinson. Robust, Web and Terabyte Retrieval with Hummingbird Search Server at TREC 2004. In Proc. of TREC-13, 2004.

[36]

V. Vinay, I. J. Cox, N. Milic-Frayling, and K. R. Wood. On ranking the effectiveness of searches. In Proc. of SIGIR, pages 398--404, 2006.

Digital Library

[37]

E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proc. of SIGIR, pages 512--519, 2005.

Digital Library

[38]

C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001.

Digital Library

[39]

Y. Zhao, F. Scholer, and Y. Tsegay. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proc. of ECIR, pages 52--64, 2008.

Digital Library

[40]

Y. Zhou. Retrieval Performance Prediction and Document Quality. PhD thesis, University of Massachusetts, 2007.

Digital Library

[41]

Y. Zhou and W. B. Croft. Ranking robustness: a novel framework to predict query performance. In Proc. of CIKM, pages 567--574, 2006.

Digital Library

[42]

Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In Proc. of SIGIR, pages 543--550, 2007.

Digital Library

Cited By

Khramtsova EZhuang SBaktashmotlagh MZuccon GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Leveraging LLMs for Unsupervised Dense Retriever RankingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657798(1307-1317)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657798
Roitman HChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Unsupervised Search Algorithm Configuration using Query Performance PredictionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651579(658-661)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3651579
Faggioli GFormal TLupart SMarchesin SClinchant SFerro NPiwowarski BYoshioka MKiseleva JAliannejadi M(2023)Towards Query Performance Prediction for Neural Information Retrieval: Challenges and OpportunitiesProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605142(51-63)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605142
Show More Cited By

Index Terms

Back to the roots: a probabilistic framework for query-performance prediction
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Predicting Query Performance by Query-Drift Estimation

Predicting query performance, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. We present a novel approach to this task that is based on measuring the standard deviation of retrieval ...
Query-Performance Prediction Using Minimal Relevance Feedback
ICTIR '13: Proceedings of the 2013 Conference on the Theory of Information Retrieval

There has been much work on devising query-performance prediction approaches that estimate search effectiveness without relevance judgments (i.e., zero feedback). Specifically, post-retrieval predictors analyze the result list of top-retrieved ...
A unified framework for post-retrieval query-performance prediction
ICTIR'11: Proceedings of the Third international conference on Advances in information retrieval theory

The query-performance prediction task is estimating the effectiveness of a search performed in response to a query in lack of relevance judgments. Post-retrieval predictors analyze the result list of top-retrieved documents. While many of these ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

October 2012

2840 pages

ISBN:9781450311564

DOI:10.1145/2396761

General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

query-performance prediction

Qualifiers

Research-article

Conference

CIKM'12

Sponsor:

CIKM'12: 21st ACM International Conference on Information and Knowledge Management

October 29 - November 2, 2012

Hawaii, Maui, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
246
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Khramtsova EZhuang SBaktashmotlagh MZuccon GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Leveraging LLMs for Unsupervised Dense Retriever RankingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657798(1307-1317)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657798
Roitman HChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Unsupervised Search Algorithm Configuration using Query Performance PredictionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651579(658-661)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3651579
Faggioli GFormal TLupart SMarchesin SClinchant SFerro NPiwowarski BYoshioka MKiseleva JAliannejadi M(2023)Towards Query Performance Prediction for Neural Information Retrieval: Challenges and OpportunitiesProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605142(51-63)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605142
Poesina EIonescu RMothe JChen HDuh WHuang HKato MMothe JPoblete B(2023)iQPP: A Benchmark for Image Query Performance PredictionProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591901(2953-2963)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591901
Roitman HBalog KSetty VLioma CLiu YZhang MBerberich K(2020)ICTIR Tutorial: Modern Query Performance Prediction: Theory and PracticeProceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval10.1145/3409256.3409813(195-196)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.1145/3409256.3409813
Guo JLan Y(2020)Query ClassificationQuery Understanding for Search Engines10.1007/978-3-030-58334-7_2(15-41)Online publication date: 2-Dec-2020
https://doi.org/10.1007/978-3-030-58334-7_2
Roitman HKurland OPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Query Performance Prediction for Pseudo-Feedback-Based RetrievalProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331369(1261-1264)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331369
Zendel OShtok ARaiber FKurland OCulpepper JPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Information Needs, Queries, and Query Performance PredictionProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331253(395-404)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331253
Roy DGanguly DMitra MJones G(2019)Estimating Gaussian mixture models in the local neighbourhood of embedded word vectors for query performance predictionInformation Processing and Management: an International Journal10.1016/j.ipm.2018.10.00956:3(1026-1045)Online publication date: 1-May-2019
https://dl.acm.org/doi/10.1016/j.ipm.2018.10.009
Melucci MPaggiaro A(2019)Evaluation of information retrieval systems using structural equation modelingComputer Science Review10.1016/j.cosrev.2018.10.00131(1-18)Online publication date: Feb-2019
https://doi.org/10.1016/j.cosrev.2018.10.001
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten