Abstract
Modelling the distribution of document scores returned from an information retrieval (IR) system in response to a query is of both theoretical and practical importance. One of the goals of modelling document scores in this manner is the inference of document relevance. There has been renewed interest of late in modelling document scores using parameterised distributions. Consequently, a number of hypotheses have been proposed to constrain the mixture distribution from which document scores could be drawn.
In this article, we show how a standard performance measure (i.e., average precision) can be inferred from a document score distribution using labelled data. We use the accuracy of the inference of average precision as a measure for determining the usefulness of a particular model of document scores. We provide a comprehensive study which shows that certain mixtures of distributions are able to infer average precision more accurately than others. Furthermore, we analyse a number of mixture distributions with regard to the recall-fallout convexity hypothesis and show that the convexity hypothesis is practically useful.
Consequently, based on one of the best-performing score-distribution models, we develop some techniques for query-performance prediction (QPP) by automatically estimating the parameters of the document score-distribution model when relevance information is unknown. We present experimental results that outline the benefits of this approach to query-performance prediction.
- G. Amati and C. J. Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 4, 357--389. ISSN 1046-8188. Google ScholarDigital Library
- A. Arampatzis and J. Kamps. 2009. A signal-to-noise approach to score normalization. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). ACM, New York, NY, 797--806. ISBN 978-1-60558-512-3. Google ScholarDigital Library
- A. Arampatzis and J. Kamps. 2010. An empirical study of query specificity. In Proceedings of the 32nd European Conference on Information Retrieval (ECIR). 594--597. Google ScholarDigital Library
- A. Arampatzis and S. Robertson. 2011. Modeling score distributions in information retrieval. Inf. Retr. 14, 1, 26--46. Google ScholarDigital Library
- A. Arampatzis and A. van Hameren. 2001. The score-distributional threshold optimization for adaptive binary classification tasks. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 285--293. Google ScholarDigital Library
- A. Arampatzis, J. Kamps, and S. Robertson. 2009a. Where to stop reading a ranked list?: Threshold optimization using truncated core distributions. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 524--531. Google ScholarDigital Library
- A. Arampatzis, S. Robertson, and J. Kamps. 2009b. Score distributions in information retrieval. In Proceedings of the 2nd International Conference on the Theory of Information Retrieval (ICTIR’09). Lecture Notes in Computer Science, vol. 5766, Springer-Verlag, Berlin, 139--151. ISBN 978-3-642-04416-8. Google ScholarDigital Library
- J. A. Aslam and E. Yilmaz. 2005. A geometric interpretation and analysis of r-precision. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM). 664--671. Google ScholarDigital Library
- J. A. Aslam and E. Yilmaz. 2006. Inferring document relevance via average precision. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 601--602. Google ScholarDigital Library
- N. Balasubramanian, G. Kumaran, and V. R. Carvalho. 2010. Exploring reductions for long Web queries. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 571--578. ISBN 978-1-4503-0153-4. Google ScholarDigital Library
- C. Baumgarten. 1999. A probabilistic solution to the selection and fusion problem in distributed information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99). ACM, New York, NY, 246--253. ISBN 1-58113-096-1. Google ScholarDigital Library
- A. Bookstein. 1977. When the most pertinent document should not be retrieved---An analysis of the Swets model. Inf. Process. Manage. 13, 6, 377--383.Google ScholarCross Ref
- C. Buckley and E. M. Voorhees. 2000. Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in International Retrieval (SIGIR). 33--40. Google ScholarDigital Library
- O. Butman, A. Shtok, O. Kurland, and D. Carmel. 2013. Query-performance prediction using minimal relevance feedback. In Proceedings of the Conference on the Theory of Information Retrieval (ICTIR’13). ACM, New York, NY. ISBN 978-1-4503-2107-5. Google ScholarDigital Library
- K. Collins-Thompson, P. Ogilvie, Y. Zhang, and J. Callan. 2002. Information filtering, novelty detection, and named-page finding. In Proceedings of the 11th Text Retrieval Conference.Google Scholar
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. 2002. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’02). ACM, New York, NY, 299--306. ISBN 1-58113-561-0. Google ScholarDigital Library
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. 2006. Precision prediction based on ranked list coherence. Inf. Retr. 9, 6, 723--755. Google ScholarDigital Library
- R. Cummins. 2011. Predicting query performance directly from score distributions. In Proceedings of the 7th Asia Conference on Information Retrieval Technology (AIRS’11). Springer-Verlag, Berlin, 315--326. ISBN 978-3-642-25630-1. Google ScholarDigital Library
- R. Cummins. 2012a. Investigating performance predictors using Monte Carlo simulation and score distribution models. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, NY, 1097--1098. ISBN 978-1-4503-1472-5. Google ScholarDigital Library
- R. Cummins. 2012b. On the inference of average precision from score distributions. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 2435--2438. ISBN 978-1-4503-1156-4. Google ScholarDigital Library
- R. Cummins and C. O’Riordan. 2012. On theoretically valid score distributions in information retrieval. In Proceedings of the 34th European Conference on Advances in Information Retrieval (ECIR’12). Springer-Verlag, Berlin, 451--454. ISBN 978-3-642-28996-5. Google ScholarDigital Library
- R. Cummins, J. Jose, and C. O’Riordan. 2011. Improved query performance prediction using standard deviation. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information (SIGIR’11). ACM, New York, NY, 1089--1090. ISBN 978-1-4503-0757-4. Google ScholarDigital Library
- K. Dai, E. Kanoulas, V. Pavlu, and J. A. Aslam. 2011. Variational bayes for modeling score distributions. Inf. Retr. 14, 1, 47--67. Google ScholarDigital Library
- K. Dai, V. Pavlu, E. Kanoulas, and J. A. Aslam. 2012. Extended expectation maximization for inferring score distributions. In Proceedings of the 34th European Conference on Advances in Information Retrieval (ECIR’12). Springer-Verlag, Berlin, 293--304. ISBN 978-3-642-28996-5. Google ScholarDigital Library
- V. Dang, M. Bendersky, and W. B. Croft. 2010. Learning to rank query reformulations. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, 807--808. ISBN 978-1-4503-0153-4. Google ScholarDigital Library
- F. Diaz. 2007. Performance prediction using spatial autocorrelation. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, 583--590. ISBN 978-1-59593-597-7. Google ScholarDigital Library
- M. Evans, N. Hastings, and B. Peacock. 2001. Statistical distributions, third edition. Measure. Sci. Technol. 12, 1, 117.Google Scholar
- H. Fang and C. Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference of Research and Development in Information Retrieval (SIGIR). 480--487. Google ScholarDigital Library
- G. A. Fredricks and R. B. Nelsen. 2007. On the relationship between Spearman’s rho and Kendall’s tau for pairs of continuous random variables. J. Stat. Plan. Inference 137, 7, 2143--2150.Google ScholarCross Ref
- C. Hauff and L. Azzopardi. 2009. When is query performance prediction effective? In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 829--830. Google ScholarDigital Library
- C. Hauff, D. Hiemstra, and F. de Jong. 2008a. A survey of pre-retrieval query performance predictors. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM). 1419--1420. Google ScholarDigital Library
- C. Hauff, V. Murdock, and R. Baeza-Yates. 2008b. Improved query difficulty prediction for the Web. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 439--448. ISBN 978-1-59593-991-3. Google ScholarDigital Library
- C. Hauff, L. Azzopardi, D. Hiemstra, and F. de Jong. 2010a. Query performance prediction: Evaluation contrasted with effectiveness. In Proceedings of the 32nd European Conference on Advances in Information Retrieval (ECIR). 204--216. Google ScholarDigital Library
- C. Hauff, D. Kelly, and L. Azzopardi. 2010b. A comparison of user and system query performance predictions. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, NY, 979--988. ISBN 978-1-4503-0099-5. Google ScholarDigital Library
- D. Hawking and S. E. Robertson. 2003. On collection size and retrieval effectiveness. Inf. Retr. 6, 1, 99--105. Google ScholarDigital Library
- B. He and I. Ounis. 2006. Query performance prediction. Inf. Syst. 31, 7, 585--594. Google ScholarDigital Library
- E. Kanoulas, V. Pavlu, K. Dai, and J. A. Aslam. 2009. Modeling the score distributions of relevant and nonrelevant documents. In Proceedings of the 2nd International Conference on the Theory of Information Retrieval (ICTIR). Lecture Notes in Computer Science, vol. 5766, Springer-Verlag, Berlin, 152--163. Google ScholarDigital Library
- E. Kanoulas, K. Dai, V. Pavlu, and J. A. Aslam. 2010. Score distribution models: Assumptions, intuition, and robustness to score manipulation. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research Development in Information Retrieval (SIGIR). 242--249. Google ScholarDigital Library
- T. Kim, A. V. Nefian, and M. J. Broxton. 2010. Photometric recovery of Apollo metric imagery with Lunar-Lambertian reflectance. Electron. Lett. 46, 9, 63--633.Google Scholar
- O. Kurland, A. Shtok, D. Carmel, and S. Hummel. 2011. A unified framework for post-retrieval query-performance prediction. In Proceedings of the 3rd International Conference on the Theory of Information Retrieval (ICTIR). Lecture Notes in Computer Science, vol. 6931, Springer-Verlag, Berlin, 15--26. Google ScholarDigital Library
- O. Kurland, A. Shtok, S. Hummel, F. Raiber, D. Carmel, and O. Rom. 2012. Back to the roots: A probabilistic framework for query-performance prediction. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 823--832. ISBN 978-1-4503-1156-4. Google ScholarDigital Library
- H. Lang, B. Wang, G. Jones, J.-T. Li, F. Ding, and Y.-X. Liu. 2008. Query performance prediction for information retrieval based on covering topic score. J. Comput. Sci. Technol. 23, 4, 590--601. ISSN 1000-9000. Google ScholarDigital Library
- Y. Lv and C. Zhai. 2011. Lower-bounding term frequency normalization. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, NY, 7--16. ISBN 978-1-4503-0717-8. Google ScholarDigital Library
- D. Madigan, Y. Vardi, and I. Weissman. 2006. Extreme value theory applied to document retrieval from large collections. Inf. Retr. 9, 3, 273--294. ISSN 1386-4564. Google ScholarDigital Library
- R. Manmatha, T. Rath, and F. Feng. 2001. Modeling score distributions for combining the outputs of search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, New York, NY, 267--275. ISBN 1-58113-331-6. Google ScholarDigital Library
- G. Marsaglia. 1986. The incomplete {gamma} function as a continuous poisson distribution. Comput. Math. Appl. 12, 5--6, 1187--1190. ISSN 0898-1221.Google ScholarCross Ref
- J. Pérez-Iglesias and L. Araujo. 2010. Standard deviation as a query hardness estimator. In Proceedings of the 17th International Conference on String Processing and Information Retrieval (SPIRE). 207--212. Google ScholarDigital Library
- C. J. V. Rijsbergen. 1979. Information Retrieval 2nd Ed. Butterworth-Heinemann, Newton, MA. ISBN 0408709294. Google ScholarDigital Library
- S. Robertson. 2007. On score distributions and relevance. In Proceedings of the 29th European Conference on Information Retrieval Research (ECIR’07). Springer-Verlag, Berlin, 40--51. ISBN 978-3-540-71494-1. Google ScholarDigital Library
- S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. 1994. Okapi at trec-3. In Proceedings of the 3rd Text REtrieval Conference (TREC’94). 109--126.Google Scholar
- S. E. Robertson, E. Kanoulas, and E. Yilmaz. 2010. Extending average precision to graded relevance judgments. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, 603--610. ISBN 978-1-4503-0153-4. Google ScholarDigital Library
- G. Salton and C. Buckley. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24, 5, 513--523. Google ScholarDigital Library
- A. Shtok, O. Kurland, and D. Carmel. 2009. Predicting query performance by query-drift estimation. In Proceedings of the 2nd International Conference on the Theory of Information Retrieval (ICTIR). Lecture Notes in Computer Science, vol. 5766, Springer-Verlag, Berlin, 305--312. Google ScholarDigital Library
- A. Shtok, O. Kurland, and D. Carmel. 2010. Using statistical decision theory and relevance models for query-performance prediction. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 259--266. Google ScholarDigital Library
- A. Singhal, C. Buckley, and M. Mitra. 1996. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96). ACM, New York, NY, 21--29. ISBN 0-89791-792-8. Google ScholarDigital Library
- J. A. Swets. 1963. Information retrieval systems. Science 141, 3577, 245--250.Google Scholar
- S. Tomlinson. 2004. Robust, Web and terabyte retrieval with Hummingbird Searchserver at TREC 2004. In Proceedings of the 13th Text Retrieval Conference (TREC).Google Scholar
- V. Vinay, N. Milic-Frayling, and I. Cox. 2008. Estimating retrieval effectiveness using rank distributions. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 1425--1426. ISBN 978-1-59593-991-3. Google ScholarDigital Library
- P. Wilkins, A. F. Smeaton, and P. Ferguson. 2010. Properties of optimally weighted data fusion in CBMIR. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 643--650. ISBN 978-1-4503-0153-4. Google ScholarDigital Library
- E. Yilmaz and J. A. Aslam. 2006. Estimating average precision with incomplete and imperfect judgments. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM’06). ACM, New York, NY, 102--111. ISBN 1-59593-433-2. Google ScholarDigital Library
- E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. 2005. Learning to estimate query difficulty: Including applications to missing content detection and distributed information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). ACM, New York, NY, 512--519. ISBN 1-59593-034-5. Google ScholarDigital Library
- C. Zhai and J. Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22, 2, 179--214. ISSN 1046-8188. Google ScholarDigital Library
- Y. Zhao, F. Scholer, and Y. Tsegay. 2008. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proceedings of the 30th European Conference on Information Retrieval Research (ECIR’08). Lecture Notes in Computer Science, vol. 4956, Springer-Verlag, Berlin, 52--64. ISBN 3-540-78645-7, 978-3-540-78645-0. Google ScholarDigital Library
- Y. Zhou and W. B. Croft. 2006. Ranking robustness: A novel framework to predict query performance. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM’06). ACM, New York, NY, 567--574. ISBN 1-59593-433-2. Google ScholarDigital Library
- Y. Zhou and W. B. Croft. 2007. Query performance prediction in Web search environments. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, 543--550. ISBN 978-1-59593-597-7. Google ScholarDigital Library
Index Terms
- Document Score Distribution Models for Query Performance Inference and Prediction
Recommendations
Score distribution models: assumptions, intuition, and robustness to score manipulation
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalInferring the score distribution of relevant and non-relevant documents is an essential task for many IR applications (e.g. information filtering, recall-oriented IR, meta-search, distributed IR). Modeling score distributions in an accurate manner is ...
On the inference of average precision from score distributions
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementModelling the document scores returned from an IR system for a given query using parameterised score distributions is an area of research that has become more popular in recent years. Score distribution (SD) models are useful for a number of IR tasks. ...
A rank fusion approach based on score distributions for prioritizing relevance assessments in information retrieval evaluation
Highlights- We study how to prioritize relevance assessments in the process of creating an Information Retrieval test collection.
AbstractIn this paper we study how to prioritize relevance assessments in the process of creating an Information Retrieval test collection. A test collection consists of a set of queries, a document collection, and a set of relevance ...
Comments