Article

Dynamic test collections: measuring search effectiveness on the live web

Author:

Ian SoboroffAuthors Info & Claims

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 276 - 283

https://doi.org/10.1145/1148170.1148220

Published: 06 August 2006 Publication History

Abstract

Existing methods for measuring the quality of search algorithms use a static collection of documents. A set of queries and a mapping from the queries to the relevant documents allow the experimenter to see how well different search engines or engine configurations retrieve the correct answers. This methodology assumes that the document set and thus the set of relevant documents are unchanging. In this paper, we abandon the static collection requirement. We begin with a recent TREC collection created from a web crawl and analyze how the documents in that collection have changed over time. We determine how decay of the document collection affects TREC systems, and present the results of an experiment using the decayed collection to measure a live web search system. We employ novel measures of search effectiveness that are robust despite incomplete relevance information. Lastly, we propose a methodology of "collection maintenance" which supports measuring search performance both for a single system and between systems run at different points in time.

References

[1]

Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98), Melbourne, Australia, August 1998. ACM Press.

[2]

Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), Sheffield, UK, July 2004. ACM Press.

[3]

Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, and Andrew Tomkins. Sic transit gloria telae: Towards an understading of the web's decay. In Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pages 328--337, New York, NY, May 2004.

Digital Library

[4]

Yaniv Bernstein and Justin Zobel. A scalable system for identifying co-derivative documents. In Proceedings of the Eleventh Symposium on String Processing and Information Retrieval (SPIRE 2004), Padova, Italy, October 2004.

[5]

Brian E. Brewington and George Cybenko. How dynamic is the web? In Proceedings of the 9th International WWW Conference, pages 257--276, Amsterdam, The Netherlands, May 2000.

Digital Library

[6]

Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8/13):1157--1166, 1997.

Digital Library

[7]

Chris Buckley and Ellen M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004) {2}, pages 25--32.

Digital Library

[8]

Ben Carterette and James Allan. Incremental test collections. In Proceedings of the Fourteenth ACM Conference on Information and Knowledge Management (CIKM 2005), Bremen, Germany, November 2005.

Digital Library

[9]

Junghoo Cho and Hector Garcia-Molina. The evolution of the web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Databases (VLDB 2000), pages 200--209, September 2000.

Digital Library

[10]

Abdur Chowdhury, Ophir Frieder, David Grossman, and Mary Catherine McCabe. Collection statistics for fast duplicate document detection. ACM Transactions on Information Systems, 20(2):171--191, April 2002.

Digital Library

[11]

Charles L. A. Clarke, Ian Soboroff, and Nick Craswell. Overview of the TREC 2004 terabyte track. In E. M. Voorhees and L. P. Buckland, editors, Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004), Gaithersburg, MD, November 2004.

[12]

C. W. Cleverdon. The cranfield tests on index langauge devices. In Aslib Proceedings, volume 19, pages 173--192, 1967.

[13]

Gordon V. Cormack, Christopher R. Palmer, and Charles L. A. Clarke. Efficient construction of large test collections. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98) {1}, pages 282--289.

Digital Library

[14]

Dennis Fetterly, Mark Manasse, Marc Najork, and Janet L. Wiener. A large-scale study of the evolution of web pages. Software Practice and Experience, 34:213--237, 2004.

Digital Library

[15]

Alexandros Ntoulas, Junghoo Cho, and Christopher Olston. What's new on the web? the evolution of the web from a search engine perspective. In Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pages 1--12, New York, May 2004.

Digital Library

[16]

Mark Sanderson and Justin Zobel. Information retrieval system evaluation: Effort, sensitivity, and reliability. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), pages 162--169, Salvador, Brazil, August 2005. ACM Press.

Digital Library

[17]

Ian Soboroff. On evaluating web search with very few relevant documents. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004) {2}, pages 530--531.

Digital Library

[18]

Ian Soboroff and Stephen Robertson. Building a filtering test collection for TREC 2002. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), pages 243--250, Toronto, Canada, July 2003. ACM Press.

Digital Library

[19]

Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98) {1}.

Digital Library

[20]

Ellen M. Voorhees and Chris Buckley. The effect of topic set size on retrieval experiment error. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pages 316--323, Tampere, Finland, August 2002. ACM Press.

Digital Library

[21]

Ellen M. Voorhees and Donna~K. Harman, editors. TREC: Experiments in Information Retrieval Evaluation. MIT Press, 2005.

[22]

Justin Zobel. How reliable are the results of large-scale retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98) {1}, pages 307--314.

Digital Library

Cited By

Keller JBreuer TSchaer POosterhuis HBast HXiong C(2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672530
Frew LNelson MWeigle MSerra ESpezzano F(2024)Retrogressive Document Manipulation of US Federal Environmental WebsitesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679988(3762-3766)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679988
Keller JBreuer TSchaer P(2024)Replicability Measures for Longitudinal Information Retrieval EvaluationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_16(215-226)Online publication date: 14-Sep-2024
https://doi.org/10.1007/978-3-031-71736-9_16
Show More Cited By

Index Terms

Dynamic test collections: measuring search effectiveness on the live web
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing systems and tools
2. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
  2. World Wide Web

Recommendations

Dynamic Test Collections for Retrieval Evaluation
ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they ...
On the Reusability of Personalized Test Collections
UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization

Test collections for offline evaluation remain crucial for information retrieval research and industrial practice, yet reusability of test collections is under threat by different factors such as dynamic nature of data collections and new trends in ...
Constructing test collections by inferring document relevance via extracted relevant information
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

The goal of a typical information retrieval system is to satisfy a user's information need---e.g., by providing an answer or information "nugget"---while the actual search space of a typical information retrieval system consists of documents---i.e., ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

August 2006

768 pages

ISBN:1595933697

DOI:10.1145/1148170

General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

retrieval test collections

Qualifiers

Article

Conference

SIGIR06

Sponsor:

SIGIR06: The 29th Annual International SIGIR Conference

August 6 - 11, 2006

Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
869
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Keller JBreuer TSchaer POosterhuis HBast HXiong C(2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672530
Frew LNelson MWeigle MSerra ESpezzano F(2024)Retrogressive Document Manipulation of US Federal Environmental WebsitesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679988(3762-3766)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679988
Keller JBreuer TSchaer P(2024)Replicability Measures for Longitudinal Information Retrieval EvaluationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_16(215-226)Online publication date: 14-Sep-2024
https://doi.org/10.1007/978-3-031-71736-9_16
Galuscáková PDeveaud RGonzález Sáez GMulhem PGoeuriot LPiroi FPopel MChen HDuh WHuang HKato MMothe JPoblete B(2023)LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search EvaluationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591921(3086-3094)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591921
Dadure PPakray PBandyopadhyay S(2022)Embedding and generalization of formula with context in the retrieval of mathematical informationJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2021.05.01434:9(6624-6634)Online publication date: Oct-2022
https://doi.org/10.1016/j.jksuci.2021.05.014
González-Sáez GMulhem PGoeuriot L(2021)Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot SystemsExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-85251-1_8(91-102)Online publication date: 14-Sep-2021
https://doi.org/10.1007/978-3-030-85251-1_8
Kutlu MElsayed THasanain MLease MCuzzocrea AAllan JPaton NSrivastava DAgrawal RBroder AZaki MCandan SLabrinidis ASchuster AWang H(2018)When Rank Order Isn't EnoughProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271751(397-406)Online publication date: 17-Oct-2018
https://dl.acm.org/doi/10.1145/3269206.3271751
Carterette B(2018)Search Engine MetricsEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_325(3328-3333)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-1-4614-8265-9_325
Carterette B(2017)Search Engine MetricsEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_325-2(1-6)Online publication date: 5-Jan-2017
https://doi.org/10.1007/978-1-4899-7993-3_325-2
Zhou ZXiang SChen T(2016)Metamorphic Testing for Software Quality Assessment: A Study of Search EnginesIEEE Transactions on Software Engineering10.1109/TSE.2015.247800142:3(264-284)Online publication date: 1-Mar-2016
https://doi.org/10.1109/TSE.2015.2478001
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents