skip to main content
10.1145/2396761.2398553acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Differences in effectiveness across sub-collections

Published:29 October 2012Publication History

ABSTRACT

The relative performance of retrieval systems when evaluated on one part of a test collection may bear little or no similarity to the relative performance measured on a different part of the collection. In this paper we report the results of a detailed study of the impact that different sub-collections have on retrieval effectiveness, analyzing the effect over many collections, and with different approaches to sub-dividing the collections. The effect is shown to be substantial, impacting on comparisons between retrieval runs that are statistically significant. Some possible causes for the effect are investigated, and the implications of this work are examined for test collection design and for the strength of conclusions one can draw from experimental results.

References

  1. C. W. Cleverdon, "The Evaluation of Systems Used in Information Retrieval (1958: Washington)," in Proceedings of the International Conference on Scientific Information - Two Volumes, 1959, pp. 687--698.Google ScholarGoogle Scholar
  2. E. M. Voorhees, "Variations in relevance judgments and the measurement of retrieval effectiveness," in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 1998, pp. 315--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Carterette, V. Pavlu, E. Kanoulas, J. A. Aslam, and J. Allan, "Evaluation over thousands of queries," in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008, pp. 651--658. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Tague-Sutcliffe, "The pragmatics of information retrieval experimentation, revisited," Information Processing & Management, vol. 28, no. 4, pp. 467--490, Jul. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Soboroff, "Dynamic test collections: measuring search effectiveness on the live web," in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006, pp. 276--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. M. Voorhees and D. K. Harman, TREC: Experiment and Evaluation in Information Retrieval, illustrated ed. The MIT Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford, "Okapi at TREC-3," NIST SPECIAL PUBLICATION SP, pp. 109--126, 1995.Google ScholarGoogle Scholar
  8. A. Singhal, C. Buckley, and M. Mitra, "Pivoted document length normalization," in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, 1996, pp. 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Scholer, A. Turpin, and M. Sanderson, "Quantifying test collection quality based on the consistency of relevance judgements," in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information, 2011, pp. 1063--1072. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Clarke, N. Craswell, and I. Soboroff, "Overview of the TREC 2004 Terabyte Track," in Proceedings of TREC, 2004, vol. 2004.Google ScholarGoogle Scholar
  11. E. M. Voorhees and C. Buckley, "The effect of topic set size on retrieval experiment error," in Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 2002, pp. 316--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. D. Smucker, J. Allan, and B. Carterette, "A comparison of statistical significance tests for information retrieval evaluation," in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, 2007, pp. 623--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Lin, "Divergence measures based on the Shannon entropy," Information Theory, IEEE Transactions on, vol. 37, no. 1, pp. 145--151, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Differences in effectiveness across sub-collections

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
      October 2012
      2840 pages
      ISBN:9781450311564
      DOI:10.1145/2396761

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader