Living Labs for Online Evaluation: From Theory to Practice

Schuth, Anne; Balog, Krisztian

doi:10.1007/978-3-319-30671-1_88

Anne Schuth²¹ &
Krisztian Balog²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

European Conference on Information Retrieval

4256 Accesses

Abstract

Experimental evaluation has always been central to Information Retrieval research. The field is increasingly moving towards online evaluation, which involves experimenting with real, unsuspecting users in their natural task environments, a so-called living lab. Specifically, with the recent introduction of the Living Labs for IR Evaluation initiative at CLEF and the OpenSearch track at TREC, researchers can now have direct access to such labs. With these benchmarking platforms in place, we believe that online evaluation will be an exciting area to work on in the future. This half-day tutorial aims to provide a comprehensive overview of the underlying theory and complement it with practical guidance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Balog, K., Kelly, L., Schuth, A.: Head first: living labs for ad-hoc search evaluation. In: CIKM 2014, pp. 1815–1818. ACM Press, New York, USA, November 2014
Google Scholar
Belkin, N.J.: Salton award lecture: people, interacting with information. In: Proceedings of 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 1–2. ACM (2015)
Google Scholar
Chuklin, A., Markov, I., de Rijke, M.: Click Models for Web Search. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, San Rafael (2015)
Google Scholar
Cleverdon, C.W., Keen, M.: Aslib Cranfield research project-factors determining the performance of indexing systems; Volume 2, Test results, National Science Foundation (1966)
Google Scholar
Diaz, F., White, R., Buscher, G., Liebling, D.: Robust models of mouse movement on dynamic web search results pages. In: CIKM, pp. 1451–1460. ACM Press, October 2013
Google Scholar
Guo, Q., Agichtein, E.: Understanding “abandoned” ads: towards personalized commercial intent inference via mouse movement analysis. In: SIGIR-IRA (2008)
Google Scholar
Guo, Q., Agichtein, E.: Towards predicting web searcher gaze position from mouse movements. In: CHI EA, 3601p, April 2010
Google Scholar
Hassan, A., Shi, X., Craswell, N., Ramsey, B.: Beyond clicks: query reformulation as a predictor of search satisfaction. In: CIKM (2013)
Google Scholar
He, J., Zhai, C., Li, X.: Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In: CIKM 2009, ACM (2009)
Google Scholar
He, Y., Wang, K.: Inferring search behaviors using partially observable markov model with duration (POMD). In: WSDM (2011)
Google Scholar
Hersh, W., Turpin, A.H., Price, S., Chan, B., Kramer, D., Sacherek, L., Olson, D.: Do batch and user evaluations give the same results? In: SIGIR, pp. 17–24 (2000)
Google Scholar
Hofmann, K., Whiteson, S., de Rijke, M.: A probabilistic method for inferring preferences from clicks. In: CIKM 2011, ACM (2011)
Google Scholar
Jeff, H., Thomas, L., Ryen, W.: No search result left behind. In: WSDM, 203p (2012)
Google Scholar
Joachims, T., Granka, L.A., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. 25(2), 7 (2007)
Article Google Scholar
Kim, Y., Hassan, A., White, R., Zitouni, I.: Modeling dwell time to predict click-level satisfaction. In: WSDM (2014)
Google Scholar
Kohavi, R.: Online controlled experiments: introduction, insights, scaling, and humbling statistics. In: Proceedings of UEO 2013 (2013)
Google Scholar
Lagun, D., Hsieh, C.H., Webster D., Navalpakkam, V.: Towards better measurement of attention and satisfaction in mobile search. In: SIGIR (2014)
Google Scholar
Li, J., Huffman, S., Tokuda, A.: Good abandonment in mobile and pc internet search. In: SIGIR 2009, pp. 43–50 (2009)
Google Scholar
Liu, T.-Y.: Learning to Rank for Information Retrieval. Springer, Heidelberg (2011)
Book MATH Google Scholar
Radlinski, F., Craswell, N.: Optimized interleaving for online retrieval evaluation. In: WSDM 2013, ACM (2013)
Google Scholar
Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: CIKM 2008, ACM (2008)
Google Scholar
Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trends Inf. Retrieval 4(4), 247–375 (2010)
Article MATH Google Scholar
Schuth, A., Balog, K., Kelly, L.: Overview of the living labs for information retrieval evaluation (ll4ir) clef lab. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, pp. 484–496. Springer, Heidelberg (2015)
Chapter Google Scholar
Schuth, A., Bruintjes, R.-J., Büttner, F., van Doorn, J., Groenland, C., Oosterhuis, H., Tran, C.-N., Veeling, B., van der Velde, J., Wechsler, R., Woudenberg, D., de Rijke, M.: Probabilistic multileave for online retrieval evaluation. In: Proceedings of SIGIR (2015)
Google Scholar
Schuth, A., Hofmann, K., Radlinski, F.: Predicting search satisfaction metrics with interleaved comparisons. In: SIGIR 2015 (2015)
Google Scholar
Schuth, A., Hofmann, K., Whiteson, S., de Rijke, M.: Lerot: an online learning to rank framework. In: LivingLab 2013, pp. 23–26. ACM Press, November 2013
Google Scholar
Schuth, A., Sietsma, F., Whiteson, S., Lefortier, D., de Rijke, M.: Multileaved comparisons for fast online evaluation. In: CIKM 2014 (2014)
Google Scholar
Song, Y., Shi, X., White, R., Hassan, A.: Context-aware web search abandonment prediction. In: SIGIR (2014)
Google Scholar
Teevan, J., Dumais, S., Horvitz, E.: The potential value of personalizing search. In: SIGIR, pp. 756–757 (2007)
Google Scholar
Turpin, A., Hersh, W.: Why batch and user evaluations do not give the same results. In: SIGIR, pp. 225–231 (2001)
Google Scholar
Turpin, A., Scholar, F.: User performance versus precision measures for simple search tasks. In: SIGIR, pp. 11–18 (2006)
Google Scholar
Wang, K., Walker, T., Zheng, Z.: PSkip: estimating relevance ranking quality from web search clickthrough data. In: KDD, pp. 1355–1364 (2009)
Google Scholar
Wang, K., Gloy, N., Li, X.: Inferring search behaviors using partially observable Markov (POM) model. In: WSDM (2010)
Google Scholar
Yilmaz, E., Verma, M., Craswell, N., Radlinski, F., Bailey, P.: Relevance and effort: an analysis of document utility. In: CIKM (2014)
Google Scholar
Yue, Y., Joachims, T.: Interactively optimizing information retrieval systems as a dueling bandits problem. In: ICML 2009, pp. 1201–1208 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Anne Schuth
University of Stavanger, Stavanger, Norway
Krisztian Balog

Authors

Anne Schuth
View author publications
You can also search for this author in PubMed Google Scholar
Krisztian Balog
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne Schuth .

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Padova, Italy
Nicola Ferro
Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
Fabio Crestani
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Systèmes d’informations, Big Data et Recherche d’Information, Institut de Recherche en Informatique de Toulouse IRIT/équipe SIG, Toulouse Cedex 04, France
Josiane Mothe
Yahoo! Labs London, London, UK
Fabrizio Silvestri
Department of Information Engineering, University of Padua, Padova, Italy
Giorgio Maria Di Nunzio
TU Delft - EWI/ST/WIS, Delft, The Netherlands
Claudia Hauff
Department of Information Engineering, University of Padua, Padova, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schuth, A., Balog, K. (2016). Living Labs for Online Evaluation: From Theory to Practice. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_88

Download citation

DOI: https://doi.org/10.1007/978-3-319-30671-1_88
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics