Practical Online Retrieval Evaluation

Radlinski, Filip; Hofmann, Katja

doi:10.1007/978-3-642-36973-5_107

Filip Radlinski²³ &
Katja Hofmann²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Included in the following conference series:

European Conference on Information Retrieval

2901 Accesses
4 Citations

Abstract

Online evaluation allows the assessment of information retrieval (IR) techniques based on how real users respond to them. Because this technique is directly based on observed user behavior, it is a promising alternative to traditional offline evaluation, which is based on manual relevance assessments. In particular, online evaluation can enable comparisons in settings where reliable assessments are difficult to obtain (e.g., personalized search) or expensive (e.g., for search by trained experts in specialized collections).

Despite its advantages, and its successful use in commercial settings, online evaluation is rarely employed outside of large commercial search engines due to a perception that it is impractical at small scales. The goal of this tutorial is to show how online evaluations can be conducted in such settings, demonstrate software to facilitate its use, and promote further research in the area. We will also contrast online evaluation with standard offline evaluation, and provide an overview of online approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agichtein, E., Brill, E., Dumais, S., Ragno, R.: Learning user interaction models for predicting web search result preferences. In: SIGIR 2006, pp. 3–10 (2006)
Google Scholar
Allan, J., Aslam, J.A., Carterette, B., Pavlu, V., Kanoulas, E.: Million query track 2008 overview. In: TREC 2008 (2008)
Google Scholar
Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there: Preference Judgments for Relevance. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008)
Chapter Google Scholar
Carterette, B., Jones, R.: Evaluating search engines by modeling the relationship between relevance and clicks. In: NIPS 2007 (2007)
Google Scholar
Chapelle, O., Joachims, T., Radlinski, F., Yue, Y.: Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst. 30(1), 6:1–6:41 (2012)
Article Google Scholar
Clarke, C., Agichtein, E., Dumais, S., White, R.: The influence of caption features on clickthrough patterns in web search. In: SIGIR 2007, pp. 135–142 (2007)
Google Scholar
http://www.clef-initiative.eu/
Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: WSDM 2008 (2008)
Google Scholar
Dupret, G., Murdock, V., Piwowarski, B.: Web search engine evaluation using clickthrough data and a user model. In: WWW Wksp. on Query Log Analysis (2007)
Google Scholar
Hardtke, D., Wertheim, M., Cramer, M.: Demonstration of improved search result relevancy using real-time implicit relevance feedback. In: SIGIR Wksp. on Understanding the User (2009)
Google Scholar
Hofmann, K., Behr, F., Radlinski, F.: On caption bias in interleaving experiments. In: CIKM 2012 (2012)
Google Scholar
Hofmann, K., Whiteson, S., de Rijke, M.: A probabilistic method for inferring preferences from clicks. In: CIKM 2011, pp. 249–258 (2011)
Google Scholar
Hofmann, K., Whiteson, S., de Rijke, M.: Estimating interleaved comparison outcomes from historical click data. In: CIKM 2012 (2012)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002, pp. 133–142 (2002)
Google Scholar
Joachims, T.: Unbiased evaluation of retrieval quality using clickthrough data. In: SIGIR Wksp. on Mathematical/Formal Methods in Information Retrieval (2002)
Google Scholar
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM TOIS 25(2) (2007)
Google Scholar
Li, J., Huffman, S., Tokuda, A.: Good abandonment in mobile and pc internet search. In: SIGIR 2009, pp. 43–50 (2009)
Google Scholar
Matthijs, N., Radlinski, F.: Personalizing web search using long term browsing history. In: WSDM 2011 (2011)
Google Scholar
Radlinski, F., Bennett, P., Yilmaz, E.: Detecting duplicate web documents using clickthrough data. In: WSDM 2011 (2011)
Google Scholar
Radlinski, F., Craswell, N.: Comparing the sensitivity of information retrieval metrics. In: SIGIR 2010 (2010)
Google Scholar
Radlinski, F., Joachims, T.: Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In: AAAI 2006, pp. 1406–1412 (2006)
Google Scholar
Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: CIKM 2008 (2008)
Google Scholar
Sanderson, M.: Test collection based evaluation of information retrieval systems. Foundations and Trends in Information Retrieval 4(4), 247–375 (2010)
Article MATH Google Scholar
TREC: the Text REtrieval Conference, http://trec.nist.gov/
Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. In: Digital Libraries and Electronic Publishing. MIT Press (2005)
Google Scholar
Wang, K., Walker, T., Zheng, Z.: PSkip: Estimating relevance ranking quality from web search clickthrough data. In: KDD 2009 (2009)
Google Scholar
Yue, Y., Gao, Y., Chapelle, O., Zhang, Y., Joachims, T.: Learning more powerful test statistics for click-based retrieval evaluation. In: SIGIR 2010 (2010)
Google Scholar
Yue, Y., Patel, R., Roehrig, H.: Beyond position bias: Examining result attractiveness as a source of presentation bias in clickthrough data. In: WWW 2010 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft, Cambridge, UK
Filip Radlinski
ISLA, University of Amsterdam, Amsterdam, The Netherlands
Katja Hofmann

Authors

Filip Radlinski
View author publications
You can also search for this author in PubMed Google Scholar
Katja Hofmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yandex, Leo Tolstoy, 16, 119021, Moscow, Russia
Pavel Serdyukov & Ilya Segalovich &
Kontur Labs and Ural Federal University, Fonvizina 3-27, 620078, Yekaterinburg, Russia
Pavel Braslavski
National Research University Higher School of Economics (HSE), Pokrovskii bd 11, 109028, Moscow, Russia
Sergei O. Kuznetsov
University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Knowledge Media Institute, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Mathematics & Computer Science Department, Emory University, 400 dowman Drive, 30329, Atlanta, GA, USA
Eugene Agichtein
Department of Computer Science, University College London, Gower Street, WC1E 6BT, London, UK
Emine Yilmaz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Radlinski, F., Hofmann, K. (2013). Practical Online Retrieval Evaluation. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_107

Download citation

DOI: https://doi.org/10.1007/978-3-642-36973-5_107
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics