research-article

Query Performance Prediction Using Reference Lists

Authors:
Anna Shtok

Technion—Israel Institute of Technology, Haifa, Israel

Technion—Israel Institute of Technology, Haifa, Israel
View Profile

,
Oren Kurland

Technion—Israel Institute of Technology, Haifa, Israel

Technion—Israel Institute of Technology, Haifa, Israel
View Profile

,
David Carmel

Yahoo Lab, Haifa, Haifa, Israel

Yahoo Lab, Haifa, Haifa, Israel
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 34 Issue 4Article No.: 19pp 1–34https://doi.org/10.1145/2926790

Published:09 June 2016Publication History

ACM Transactions on Information Systems

Abstract

The task of query performance prediction is to estimate the effectiveness of search performed in response to a query when no relevance judgments are available. We present a novel probabilistic analysis of the performance prediction task. The analysis gives rise to a general prediction framework that uses pseudo-effective or ineffective document lists that are retrieved in response to the query. These lists serve as reference to the result list at hand, the effectiveness of which we want to predict. We show that many previously proposed prediction methods can be explained using our framework. More generally, we shed new light on existing prediction methods and establish formal common grounds to seemingly different prediction approaches. In addition, we formally demonstrate the connection between prediction using reference lists and fusion of retrieved lists, and provide empirical support to this connection. Through an extensive empirical exploration, we study various factors that affect the quality of prediction using reference lists.

References

James Allan, Margaret E. Connell, W. Bruce Croft, Fang-Fang Feng, David Fisher, and Xiaoyan Li. 2000. INQUERY and TREC-9. In Proceedings of the 9th Text Retrieval Conference (TREC’00). 551--562. NIST Special Publication 500-249.Google Scholar
Giambattista Amati, Claudio Carpineto, and Giovanni Romano. 2004. Query difficulty, robustness, and selective application of query expansion. In Proceedings of 26th European Conference on Information Retrieval (ECIR’04). 127--137.Google ScholarCross Ref
Javed A. Aslam and Virgiliu Pavlu. 2007. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Proceedings of the 29th European Conference on Information Retrieval (ECIR’07). 198--209. Google ScholarDigital Library
Niranjan Balasubramanian and James Allan. 2010. Learning to select rankers. In Proceedings of the 33th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 855--856. Google ScholarDigital Library
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, Ophir Frieder, David A. Grossman, and Nazli Goharian. 2003. Disproving the fusion hypothesis: An analysis of data fusion via effective information retrieval strategies. In Proceedings of the 18th Annual ACM Symposium on Applied Computing (SAC’03). 823--827. Google ScholarDigital Library
Y. Bernstein, B. Billerbeck, S. Garcia, N. Lester, F. Scholer, and J. Zobel. 2005. RMIT University at TREC 2005: Terabyte and robust track. In Proceedings of the 14th Text Retrieval Conference (TREC’04).Google Scholar
David Carmel and Elad Yom-Tov. 2010. Estimating the Query Difficulty for Information Retrieval. Morgan & Claypool, San Francisco, CA. Google ScholarDigital Library
David Carmel, Elad Yom-Tov, Adam Darlow, and Dan Pelleg. 2006. What makes a query difficult?. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 390--397. Google ScholarDigital Library
W. Bruce Croft (Ed.). 2000. Combining approaches to information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval. W. Bruce Croft (Ed.). Number 7 in The Kluwer International Series on Information Retrieval. Kluwer. 1--36.Google Scholar
Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft. 2002. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 299--306. Google ScholarDigital Library
Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft. 2004. A Language Modeling Framework for Selective Query Expansion. Technical Report IR-338. Center for Intelligent Information Retrieval, University of Massachusetts.Google Scholar
Ronan Cummins. 2011. Predicting query performance directly from score distributions. In Proceedings of the 7th Asia Information Retrieval Societies Conference (AIRS’11). 315--326. Google ScholarDigital Library
Ronan Cummins. 2014. Document score distribution models for query performance inference and prediction. ACM Transactions on Information Systems 32, 1, 2. Google ScholarDigital Library
Ronan Cummins, Joemon M. Jose, and Colm O’Riordan. 2011. Improved query performance prediction using standard deviation. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 1089--1090. Google ScholarDigital Library
Keshi Dai, Virgiliu Pavlu, Evangelos Kanoulas, and Javed A. Aslam. 2012. Extended expectation maximization for inferring score distributions. In Proceedings of the 34th European Conference on Information Retrieval (ECIR’12). 293--304. Google ScholarDigital Library
Van Dang, Michael Bendersky, and W. Bruce Croft. 2010. Learning to rank query reformulations. In Proceedings of the 33th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 807--808. Google ScholarDigital Library
Scott Deerwester, Susan Dumais, George Furnas, Thomas Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6, 391--407.Google ScholarCross Ref
Fernando Diaz. 2005. Regularizing ad hoc retrieval scores. In Proceedings of the 14th International Conference on Information and Knowledge Management (CIKM’05). 672--679. Google ScholarDigital Library
Fernando Diaz. 2007. Performance prediction using spatial autocorrelation. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 583--590. Google ScholarDigital Library
Edward A. Fox and Joseph A. Shaw. 1994. Combination of multiple searches. In Proceedings of the 2nd Text REtrieval Conference (TREC’94).Google Scholar
Donna Harman and Chris Buckley. 2004. The NRRC reliable information access (RIA) workshop. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 528--529. Google ScholarDigital Library
Claudia Hauff. 2010. Predicting the Effectiveness of Queries and Retrieval Systems. Ph.D. Dissertation. University of Twente, Enschede, The Netherlands.Google Scholar
Claudia Hauff, Djoerd Hiemstra, and Franciska de Jong. 2008. A survey of pre-retrieval query performance predictors. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). 1419--1420. Google ScholarDigital Library
Ben He and Iadh Ounis. 2004. Inferring query performance using pre-retrieval predictors. In Proceedings of the 11th International Conference on String Processing and Information Retrieval (SPIRE’04). 43--54.Google ScholarCross Ref
Giridhar Kumaran and Vitor R. Carvalho. 2009. Reducing long queries using query quality predictors. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 564--571. Google ScholarDigital Library
Oren Kurland, Anna Shtok, David Carmel, and Shay Hummel. 2011. A unified framework for post-retrieval query-performance prediction. In Proceedings of the 3rd Conference on on Theory of Information Retrieval (ICTIR’11). 15--26. Google ScholarDigital Library
Oren Kurland, Anna Shtok, Shay Hummel, Fiana Raiber, David Carmel, and Ofri Rom. 2012. Back to the roots: A probabilistic framework for query-performance prediction. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). 823--832. Google ScholarDigital Library
John Lafferty and ChengXiang Zhai. 2003. Probabilistic relevance models based on document and query generation. In Language Modeling for Information Retrieval. W. Bruce Croft and John Lafferty (Eds.). Number 13 in Information Retrieval Book Series. Kluwer. 1--10.Google Scholar
Victor Lavrenko and W. Bruce Croft. 2001. Relevance-based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 120--127. Google ScholarDigital Library
Joon Ho Lee. 1997. Analyses of multiple evidence combination. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 267--276. Google ScholarDigital Library
Josiane Mothe and Ludovic Tanguy. 2005. Linguistic features to predict query difficulty. In ACM SIGIR 2005 Workshop on Predicting Query Difficulty—Methods and Applications.Google Scholar
Joaquín Pérez-Iglesias and Lourdes Araujo. 2010. Standard deviation as a query hardness estimator. In Proceedings of the 17th String Processing and Information Retrieval (SPIRE’10). 207--212. Google ScholarDigital Library
Fiana Raiber and Oren Kurland. 2014. Query-performance prediction: Setting the expectations straight. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 13--22. Google ScholarDigital Library
Stephen E. Robertson. 1977. The probability ranking principle in IR. Journal of Documentation 294--304.Google ScholarCross Ref
Joseph John Rocchio. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, Gerard Salton (Ed.). Prentice Hall, Upper Saddle River, NJ, 313--323.Google Scholar
Thomas Rölleke and Jun Wang. 2006. A parallel derivation of probabilistic information retrieval models. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 107--114. Google ScholarDigital Library
Falk Scholer and Steven Garcia. 2009. A case for improved evaluation of query difficulty prediction. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 640--641. Google ScholarDigital Library
Falk Scholer, Hugh E. Williams, and Andrew Turpin. 2004. Query association surrogates for web search. Journal of the Association for Information Science and Technology 55, 7, 637--650. Google ScholarDigital Library
Anna Shtok, Oren Kurland, and David Carmel. 2009. Predicting query performance by query-drift estimation. In Proceedings of the 2nd International Conference on the Theory of Information Retrieval (ICTIR’09). 305--312. Google ScholarDigital Library
Anna Shtok, Oren Kurland, and David Carmel. 2010. Using statistical decision theory and relevance models for query-performance prediction. In Proceedings of the 33th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 259--266. Google ScholarDigital Library
Anna Shtok, Oren Kurland, David Carmel, Fiana Raiber, and Gad Markovits. 2012. Predicting query performance by query-drift estimation. ACM Transactions on Information Systems 30, 2, 11. Google ScholarDigital Library
Mark D. Smucker and Chandra Prakash Jethani. 2010. Human performance and retrieval precision revisited. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 595--602. Google ScholarDigital Library
Mark D. Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). 623--632. Google ScholarDigital Library
Ian Soboroff, Charles K. Nicholas, and Patrick Cahan. 2001. Ranking retrieval systems without relevance judgments. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 66--73. Google ScholarDigital Library
Fei Song and W. Bruce Croft. 1999. A general language model for information retrieval (poster abstract). In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 279--280. Google ScholarDigital Library
Karen Sparck Jones, Steve Walker, and Stephen E. Robertson. 2000. A probabilistic model of information retrieval: Development and comparative experiments—Part 1. Information Processing and Management 36, 6, 779--808. Google ScholarDigital Library
Stephen Tomlinson. 2004. Robust, web and terabyte retrieval with hummingbird search server at TREC 2004. In Proceedings of the 13th Text Retrieval Conference (TREC’04).Google Scholar
Theodora Tsikrika and Mounia Lalmas. 2001. Merging techniques for performing data fusion on the web. In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM’01). 127--134. Google ScholarDigital Library
C. J. van Rijsbergen. 1979. Information Retrieval (2nd ed.). Butterworths, Oxford, UK. Google ScholarDigital Library
Vishwa Vinay, Ingemar J. Cox, Natasa Milic-Frayling, and Kenneth R. Wood. 2006. On ranking the effectiveness of searches. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 398--404. Google ScholarDigital Library
Christopher C. Vogt and Garrison W. Cottrell. 1998. Predicting the performance of linearly combined IR systems. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 190--196. Google ScholarDigital Library
Ellen M. Voorhees. 2004. Overview of the TREC 2004 robust retrieval track. In Proceedings of the 13th Text Retrieval Conference (TREC’04).Google Scholar
Ellen M. Voorhees and Donna K. Harman. 2005. TREC: Experiments and Evaluation in Information Retrieval. MIT Press, Cambridge, MA. Google ScholarDigital Library
Xuanhui Wang, Hui Fang, and ChengXiang Zhai. 2008. A study of methods for negative relevance feedback. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 219--226. Google ScholarDigital Library
William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Transactions of Information Systems 28, 4, Article 20, 38 pages. Google ScholarDigital Library
Elad Yom-Tov, Shai Fine, David Carmel, and Adam Darlow. 2005a. Learning to estimate query difficulty: Including applications to missing content detection and distributed information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 512--519. Google ScholarDigital Library
Elad Yom-Tov, Shai Fine, David Carmel, and Adam Darlow. 2005b. Metasearch and federation using query difficulty prediction. In ACM SIGIR 2005 Workshop on Predicting Query Difficulty—Methods and Applications.Google Scholar
ChengXiang Zhai and John D. Lafferty. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 334--342. Google ScholarDigital Library
Ying Zhao, Falk Scholer, and Yohannes Tsegay. 2008. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proceedings of the 30th European Conference on Information Retrieval Research (ECIR’08). 52--64. Google ScholarDigital Library
Yun Zhou. 2007. Retrieval Performance Prediction and Document Quality. PhD thesis. University of Massachusetts, Amherst, MA. Google ScholarDigital Library
Yun Zhou and Bruce Croft. 2006. Ranking robustness: A novel framework to predict query performance. In Proceedings of the 15th ACM Conference on Information and Knowledge (CIKM’06). 567--574. Google ScholarDigital Library
Yun Zhou and W. Bruce Croft. 2007. Query performance prediction in web search environments. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development on Information Retrieval. 543--550. Google ScholarDigital Library

Index Terms

Query Performance Prediction Using Reference Lists
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Recommendations

Query Performance Prediction using Passage Information
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

We focus on the post-retrieval query performance prediction (QPP) task. Specifically, we make a new use of passage information for this task. Using such information we derive a new mean score calibration predictor that provides a more accurate ...
Read More
An Enhanced Approach to Query Performance Prediction Using Reference Lists
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

We address the problem of query performance prediction (QPP) using reference lists. To date, no previous QPP method has been fully successful in generating and utilizing several pseudo-effective and pseudo-ineffective reference lists. In this work, we ...
Read More
Query performance prediction

The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 34, Issue 4
September 2016
217 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/2954381
Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2016
- Revised: 1 November 2015
- Accepted: 1 November 2015
- Received: 1 February 2015
Published in tois Volume 34, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Query performance prediction
reference lists
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 553
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Query Performance Prediction Using Reference Lists

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Query Performance Prediction using Passage Information

An Enhanced Approach to Query Performance Prediction Using Reference Lists

Query performance prediction