skip to main content
10.1145/3077136.3080644acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

On the Reusability of "Living Labs" Test Collections: A Case Study of Real-Time Summarization

Published: 07 August 2017 Publication History

Abstract

Information retrieval test collections are typically built using data from large-scale evaluations in international forums such as TREC, CLEF, and NTCIR. Previous validation studies on pool-based test collections for ad hoc retrieval have examined their reusability to accurately assess the effectiveness of systems that did not participate in the original evaluation. To our knowledge, the reusability of test collections derived from "living labs" evaluations, based on logs of user activity, has not been explored. In this paper, we performed a "leave-one-out" analysis of human judgment data derived from the TREC 2016 Real-Time Summarization Track and show that those judgments do not appear to be reusable. While this finding is limited to one specific evaluation, it does call into question the reusability of test collections built from living labs in general, and at the very least suggests the need for additional work in validating such experimental instruments.

References

[1]
Krisztian Balog, Anne Schuth, Peter Dekker, Narges Tavakolpoursaleh, Philipp Schaer, and Po-Yu Chuang. 2016. Overview of the TREC 2016 Open Search Track TREC.
[2]
Chris Buckley, Darrin Dimmick, Ian Soboroff, and Ellen Voorhees. 2007. Bias and the Limits of Pooling for Large Collections. Information Retrieval Vol. 10, 6 (2007), 491--508.
[3]
Stefan Büttcher, Charles L. A. Clarke, Peter C. K. Yeung, and Ian Soboroff. 2007. Reliable Information Retrieval Evaluation with Incomplete and Biased Judgements SIGIR. 63--70.
[4]
Ben Carterette, Evgeniy Gabrilovich, Vanja Josifovski, and Donald Metzler. 2010. Measuring the Reusability of Test Collections. In WSDM. 231--239.
[5]
Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-Scale Validation and Analysis of Interleaved Search Evaluation. ACM TOIS, Vol. 30, 1 (2012), Article 6.
[6]
Allan Hanbury, Henning Müller, Krisztian Balog, Torben Brodt, Gordon V. Cormack, Ivan Eggel, Tim Gollub, Frank Hopfgartner, Jayashree Kalpathy-Cramer, Noriko Kando, Anastasia Krithara, Jimmy Lin, Simon Mercer, and Martin Potthast. 2015. Evaluation-as-a-Service: Overview and Outlook. arXiv:1512.07454.
[7]
William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kramer, Lynetta Sacherek, and Daniel Olson. 2000. Do Batch and User Evaluations Give the Same Results? SIGIR. 17--24.
[8]
Ron Kohavi, Randal M. Henne, and Dan Sommerfield. 2007. Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO. In KDD. 959--967.
[9]
Jimmy Lin, Adam Roegiest, Luchen Tan, Richard McCreadie, Ellen Voorhees, and Fernando Diaz. 2016. Overview of the TREC 2016 Real-Time Summarization Track TREC.
[10]
Xin Qian, Jimmy Lin, and Adam Roegiest. 2016. Interleaved Evaluation for Retrospective Summarization and Prospective Notification on Document Streams. In SIGIR. 175--184.
[11]
Alan Said, Jimmy Lin, Alejandro Bellogín, and Arjen P. de Vries. 2013. A Month in the Life of a Production News Recommender System CIKM Workshop on Living Labs for Information Retrieval Evaluation. 7--10.
[12]
Anne Schuth, Krisztian Balog, and Liadh Kelly. 2015. Overview of the Living Labs for Information Retrieval Evaluation (LL4IR) CLEF Lab 2015. In CLEF.
[13]
Ellen M. Voorhees. 2002. The Philosophy of Information Retrieval Evaluation CLEF. 355--370.
[14]
Justin Zobel. 1998. How Reliable Are the Results of Large-Scale Information Retrieval Experiments? SIGIR. 307--314.

Cited By

View all
  • (2022)Toward Cranfield-inspired reusability assessment in interactive information retrieval evaluationInformation Processing & Management10.1016/j.ipm.2022.10300759:5(103007)Online publication date: Sep-2022
  • (2020)Update Delivery Mechanisms for Prospective Information NeedsProceedings of the 2020 Conference on Human Information Interaction and Retrieval10.1145/3343413.3377988(308-312)Online publication date: 14-Mar-2020
  • (2020)Reproducible Online Search ExperimentsAdvances in Information Retrieval10.1007/978-3-030-45442-5_77(597-601)Online publication date: 8-Apr-2020

Index Terms

  1. On the Reusability of "Living Labs" Test Collections: A Case Study of Real-Time Summarization

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
        August 2017
        1476 pages
        ISBN:9781450350228
        DOI:10.1145/3077136
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 August 2017

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. interleaved evaluations
        2. push notifications
        3. reusability
        4. user studies

        Qualifiers

        • Short-paper

        Funding Sources

        Conference

        SIGIR '17
        Sponsor:

        Acceptance Rates

        SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
        Overall Acceptance Rate 792 of 3,983 submissions, 20%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 07 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)Toward Cranfield-inspired reusability assessment in interactive information retrieval evaluationInformation Processing & Management10.1016/j.ipm.2022.10300759:5(103007)Online publication date: Sep-2022
        • (2020)Update Delivery Mechanisms for Prospective Information NeedsProceedings of the 2020 Conference on Human Information Interaction and Retrieval10.1145/3343413.3377988(308-312)Online publication date: 14-Mar-2020
        • (2020)Reproducible Online Search ExperimentsAdvances in Information Retrieval10.1007/978-3-030-45442-5_77(597-601)Online publication date: 8-Apr-2020

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media