skip to main content
10.1145/2600428.2609484acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

On run diversity in Evaluation as a Service

Published: 03 July 2014 Publication History

Abstract

"Evaluation as a service" (EaaS) is a new methodology that enables community-wide evaluations and the construction of test collections on documents that cannot be distributed. The basic idea is that evaluation organizers provide a service API through which the evaluation task can be completed. However, this concept violates some of the premises of traditional pool-based collection building and thus calls into question the quality of the resulting test collection. In particular, the service API might restrict the diversity of runs that contribute to the pool: this might hamper innovation by researchers and lead to incomplete judgment pools that affect the reusability of the collection. This paper shows that the distinctiveness of the retrieval runs used to construct the first test collection built using EaaS, the TREC 2013 Microblog collection, is not substantially different from that of the TREC-8 ad hoc collection, a high-quality collection built using traditional pooling. Further analysis using the `leave out uniques' test suggests that pools from the Microblog 2013 collection are less complete than those from TREC-8, although both collections benefit from the presence of distinctive and effective manual runs. Although we cannot yet generalize to all EaaS implementations, our analyses reveal no obvious flaws in the test collection built using the methodology in the TREC 2013 Microblog track.

References

[1]
C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. Bias and the limits of pooling for large collections. Information Retrieval, 10:491--508, 2007.
[2]
D. Harman. Information Retrieval Evaluation. Morgan & Claypool Publishers, 2011.
[3]
J. Lin and M. Efron. Evaluation as a service for information retrieval. ACM SIGIR Forum, 47(2):8--14, 2013.
[4]
J. Lin and M. Efron. Overview of the TREC-2013 microblog track. In TREC, 2013.
[5]
C. D. Manning, P. Raghavan, and H. Schütze. Evaluation in information retrieval. In Introduction to Information Retrieval, chapter 8, pages 153--154. Cambridge University Press, 2009.
[6]
R. McCreadie, I. Soboroff, J. Lin, C. Macdonald, I. Ounis, and D. McCullough. On building a reusable Twitter corpus. In SIGIR, 2012.
[7]
I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the TREC-2011 microblog track. In TREC, 2011.
[8]
K. Sparck Jones and C. van Rijsbergen. Report on the need for and provision of an "ideal" information retrieval test collection. British Library Research and Development Report 5266, 1975.
[9]
E. M. Voorhees. The philosophy of information retrieval evaluation. In CLEF, 2002.
[10]
J. Zobel. How reliable are the results of large-sacle information retrieval experiments? In SIGIR, 1998.

Cited By

View all
  • (2017)Finally, a Downloadable Test Collection of TweetsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080667(1225-1228)Online publication date: 7-Aug-2017
  • (2016)Retrievability in API-Based "Evaluation as a Service"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970427(91-94)Online publication date: 12-Sep-2016
  • (2015)On the Reusability of Open Test CollectionsProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767788(827-830)Online publication date: 9-Aug-2015

Index Terms

  1. On run diversity in Evaluation as a Service

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
    July 2014
    1330 pages
    ISBN:9781450322577
    DOI:10.1145/2600428
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 July 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. meta-evaluation
    2. reusability
    3. test collections

    Qualifiers

    • Poster

    Funding Sources

    Conference

    SIGIR '14
    Sponsor:

    Acceptance Rates

    SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Finally, a Downloadable Test Collection of TweetsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080667(1225-1228)Online publication date: 7-Aug-2017
    • (2016)Retrievability in API-Based "Evaluation as a Service"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970427(91-94)Online publication date: 12-Sep-2016
    • (2015)On the Reusability of Open Test CollectionsProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767788(827-830)Online publication date: 9-Aug-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media