skip to main content
10.1145/2396761.2396866acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Back to the roots: a probabilistic framework for query-performance prediction

Authors Info & Claims
Published:29 October 2012Publication History

ABSTRACT

The query-performance prediction task is estimating the effectiveness of a search performed in response to a query when no relevance judgments are available. Although there exist many effective prediction methods, these differ substantially in their basic principles, and rely on diverse hypotheses about the characteristics of effective retrieval. We present a novel fundamental probabilistic prediction framework. Using the framework, we derive and explain various previously proposed prediction methods that might seem completely different, but turn out to share the same formal basis. The derivations provide new perspectives on several predictors (e.g., Clarity). The framework is also used to devise new prediction approaches that outperform the state-of-the-art.

References

  1. G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In Proc. of ECIR, pages 127--137, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  2. J. A. Aslam and V. Pavlu. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Proc. of ECIR, pages 198--209, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In Proc. of WSDM, pages 95--104, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In Proc. of SIGIR, pages 390--397, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web track. In Proc. of TREC, 2009.Google ScholarGoogle Scholar
  7. K. Collins-Thompson and P. N. Bennett. Predicting query performance via classification. In Proc. of ECIR, pages 140--152, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. CoRR, abs/1004.5168, 2010.Google ScholarGoogle Scholar
  9. W. B. Croft and J. Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proc. of SIGIR, pages 299--306, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Cronen-Townsend, Y. Zhou, and W. B. Croft. A language modeling framework for selective query expansion. Technical Report IR-338, Center for Intelligent Information Retrieval, University of Massachusetts, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. R. Cummins. Predicting query performance directly from score distributions. In Proc. of AIRS, pages 315--326, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Cummins, J. M. Jose, and C. O'Riordan. Improved query performance prediction using standard deviation. In Proc. of SIGIR, pages 1089--1090, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Diaz. Performance prediction using spatial autocorrelation. In Proc. of SIGIR, pages 583--590, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Hauff, L. Azzopardi, and D. Hiemstra. The combination and evaluation of query performance prediction methods. In Proc. of ECIR, pages 301--312, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Hauff, D. Hiemstra, and F. de Jong. A survey of pre-retrieval query performance predictors. In Proc. of CIKM, pages 1419--1420, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Hauff, D. Kelly, and L. Azzopardi. A comparison of user and system query performance predictions. In Proc. of CIKM, pages 979--988, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Hauff, V. Murdock, and R. Baeza-Yates. Improved query difficulty prediction for the web. In Proc. of CIKM, pages 439--448, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In Proc. of SPIRE, pages 43--54, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Hummel, A. Shtok, F. Raiber, O. Kurland, and D. Carmel. Clarity re-visited. In Proc. of SIGIR, 2012. Poster. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O. Kurland, A. Shtok, D. Carmel, and S. Hummel. A unified framework for post-retrieval query-performance prediction. In Proc. of ICTIR, pages 15--26, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In Croft and Lafferty {9}, pages 1--10.Google ScholarGoogle Scholar
  23. J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Mothe and L. Tanguy. Linguistic features to predict query difficulty. In ACM SIGIR 2005 Workshop on Predicting Query Difficulty - Methods and Applications, 2005.Google ScholarGoogle Scholar
  26. S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, pages 294--304, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  27. T. Rölleke and J. Wang. A parallel derivation of probabilistic information retrieval models. In SIGIR, pages 107--114, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. Scholer and S. Garcia. A case for improved evaluation of query difficulty prediction. In Proc. of SIGIR, pages 640--641, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. Scholer, H. E. Williams, and A. Turpin. Query association surrogates for web search. JASIST, 55(7):637--650, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Shtok, O. Kurland, and D. Carmel. Predicting query performance by query-drift estimation. In Proc. of ICTIR, pages 305--312, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Shtok, O. Kurland, and D. Carmel. Using statistical decision theory and relevance models for query-performance prediction. In Proccedings of SIGIR, pages 259--266, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Song and W. B. Croft. A general language model for information retrieval (poster abstract). In Proc. of SIGIR, pages 279--280, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Sparck Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: development and comparative experiments - part 1. Information Processing and Management, 36(6):779--808, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. H. Steiger. Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87(2):245--251, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  35. S. Tomlinson. Robust, Web and Terabyte Retrieval with Hummingbird Search Server at TREC 2004. In Proc. of TREC-13, 2004.Google ScholarGoogle Scholar
  36. V. Vinay, I. J. Cox, N. Milic-Frayling, and K. R. Wood. On ranking the effectiveness of searches. In Proc. of SIGIR, pages 398--404, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proc. of SIGIR, pages 512--519, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Zhao, F. Scholer, and Y. Tsegay. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proc. of ECIR, pages 52--64, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Zhou. Retrieval Performance Prediction and Document Quality. PhD thesis, University of Massachusetts, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Zhou and W. B. Croft. Ranking robustness: a novel framework to predict query performance. In Proc. of CIKM, pages 567--574, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In Proc. of SIGIR, pages 543--550, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Back to the roots: a probabilistic framework for query-performance prediction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
      October 2012
      2840 pages
      ISBN:9781450311564
      DOI:10.1145/2396761

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader