Skip to main content

Practical Online Retrieval Evaluation

  • Conference paper
Advances in Information Retrieval (ECIR 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Included in the following conference series:

Abstract

Online evaluation allows the assessment of information retrieval (IR) techniques based on how real users respond to them. Because this technique is directly based on observed user behavior, it is a promising alternative to traditional offline evaluation, which is based on manual relevance assessments. In particular, online evaluation can enable comparisons in settings where reliable assessments are difficult to obtain (e.g., personalized search) or expensive (e.g., for search by trained experts in specialized collections).

Despite its advantages, and its successful use in commercial settings, online evaluation is rarely employed outside of large commercial search engines due to a perception that it is impractical at small scales. The goal of this tutorial is to show how online evaluations can be conducted in such settings, demonstrate software to facilitate its use, and promote further research in the area. We will also contrast online evaluation with standard offline evaluation, and provide an overview of online approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agichtein, E., Brill, E., Dumais, S., Ragno, R.: Learning user interaction models for predicting web search result preferences. In: SIGIR 2006, pp. 3–10 (2006)

    Google Scholar 

  2. Allan, J., Aslam, J.A., Carterette, B., Pavlu, V., Kanoulas, E.: Million query track 2008 overview. In: TREC 2008 (2008)

    Google Scholar 

  3. Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there: Preference Judgments for Relevance. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Carterette, B., Jones, R.: Evaluating search engines by modeling the relationship between relevance and clicks. In: NIPS 2007 (2007)

    Google Scholar 

  5. Chapelle, O., Joachims, T., Radlinski, F., Yue, Y.: Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst. 30(1), 6:1–6:41 (2012)

    Article  Google Scholar 

  6. Clarke, C., Agichtein, E., Dumais, S., White, R.: The influence of caption features on clickthrough patterns in web search. In: SIGIR 2007, pp. 135–142 (2007)

    Google Scholar 

  7. http://www.clef-initiative.eu/

  8. Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: WSDM 2008 (2008)

    Google Scholar 

  9. Dupret, G., Murdock, V., Piwowarski, B.: Web search engine evaluation using clickthrough data and a user model. In: WWW Wksp. on Query Log Analysis (2007)

    Google Scholar 

  10. Hardtke, D., Wertheim, M., Cramer, M.: Demonstration of improved search result relevancy using real-time implicit relevance feedback. In: SIGIR Wksp. on Understanding the User (2009)

    Google Scholar 

  11. Hofmann, K., Behr, F., Radlinski, F.: On caption bias in interleaving experiments. In: CIKM 2012 (2012)

    Google Scholar 

  12. Hofmann, K., Whiteson, S., de Rijke, M.: A probabilistic method for inferring preferences from clicks. In: CIKM 2011, pp. 249–258 (2011)

    Google Scholar 

  13. Hofmann, K., Whiteson, S., de Rijke, M.: Estimating interleaved comparison outcomes from historical click data. In: CIKM 2012 (2012)

    Google Scholar 

  14. Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002, pp. 133–142 (2002)

    Google Scholar 

  15. Joachims, T.: Unbiased evaluation of retrieval quality using clickthrough data. In: SIGIR Wksp. on Mathematical/Formal Methods in Information Retrieval (2002)

    Google Scholar 

  16. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM TOIS 25(2) (2007)

    Google Scholar 

  17. Li, J., Huffman, S., Tokuda, A.: Good abandonment in mobile and pc internet search. In: SIGIR 2009, pp. 43–50 (2009)

    Google Scholar 

  18. Matthijs, N., Radlinski, F.: Personalizing web search using long term browsing history. In: WSDM 2011 (2011)

    Google Scholar 

  19. Radlinski, F., Bennett, P., Yilmaz, E.: Detecting duplicate web documents using clickthrough data. In: WSDM 2011 (2011)

    Google Scholar 

  20. Radlinski, F., Craswell, N.: Comparing the sensitivity of information retrieval metrics. In: SIGIR 2010 (2010)

    Google Scholar 

  21. Radlinski, F., Joachims, T.: Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In: AAAI 2006, pp. 1406–1412 (2006)

    Google Scholar 

  22. Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: CIKM 2008 (2008)

    Google Scholar 

  23. Sanderson, M.: Test collection based evaluation of information retrieval systems. Foundations and Trends in Information Retrieval 4(4), 247–375 (2010)

    Article  MATH  Google Scholar 

  24. TREC: the Text REtrieval Conference, http://trec.nist.gov/

  25. Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. In: Digital Libraries and Electronic Publishing. MIT Press (2005)

    Google Scholar 

  26. Wang, K., Walker, T., Zheng, Z.: PSkip: Estimating relevance ranking quality from web search clickthrough data. In: KDD 2009 (2009)

    Google Scholar 

  27. Yue, Y., Gao, Y., Chapelle, O., Zhang, Y., Joachims, T.: Learning more powerful test statistics for click-based retrieval evaluation. In: SIGIR 2010 (2010)

    Google Scholar 

  28. Yue, Y., Patel, R., Roehrig, H.: Beyond position bias: Examining result attractiveness as a source of presentation bias in clickthrough data. In: WWW 2010 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Radlinski, F., Hofmann, K. (2013). Practical Online Retrieval Evaluation. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_107

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36973-5_107

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36972-8

  • Online ISBN: 978-3-642-36973-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics