skip to main content
10.1145/1148170.1148220acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Dynamic test collections: measuring search effectiveness on the live web

Published: 06 August 2006 Publication History

Abstract

Existing methods for measuring the quality of search algorithms use a static collection of documents. A set of queries and a mapping from the queries to the relevant documents allow the experimenter to see how well different search engines or engine configurations retrieve the correct answers. This methodology assumes that the document set and thus the set of relevant documents are unchanging. In this paper, we abandon the static collection requirement. We begin with a recent TREC collection created from a web crawl and analyze how the documents in that collection have changed over time. We determine how decay of the document collection affects TREC systems, and present the results of an experiment using the decayed collection to measure a live web search system. We employ novel measures of search effectiveness that are robust despite incomplete relevance information. Lastly, we propose a methodology of "collection maintenance" which supports measuring search performance both for a single system and between systems run at different points in time.

References

[1]
Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98), Melbourne, Australia, August 1998. ACM Press.
[2]
Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), Sheffield, UK, July 2004. ACM Press.
[3]
Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, and Andrew Tomkins. Sic transit gloria telae: Towards an understading of the web's decay. In Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pages 328--337, New York, NY, May 2004.
[4]
Yaniv Bernstein and Justin Zobel. A scalable system for identifying co-derivative documents. In Proceedings of the Eleventh Symposium on String Processing and Information Retrieval (SPIRE 2004), Padova, Italy, October 2004.
[5]
Brian E. Brewington and George Cybenko. How dynamic is the web? In Proceedings of the 9th International WWW Conference, pages 257--276, Amsterdam, The Netherlands, May 2000.
[6]
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8/13):1157--1166, 1997.
[7]
Chris Buckley and Ellen M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004) {2}, pages 25--32.
[8]
Ben Carterette and James Allan. Incremental test collections. In Proceedings of the Fourteenth ACM Conference on Information and Knowledge Management (CIKM 2005), Bremen, Germany, November 2005.
[9]
Junghoo Cho and Hector Garcia-Molina. The evolution of the web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Databases (VLDB 2000), pages 200--209, September 2000.
[10]
Abdur Chowdhury, Ophir Frieder, David Grossman, and Mary Catherine McCabe. Collection statistics for fast duplicate document detection. ACM Transactions on Information Systems, 20(2):171--191, April 2002.
[11]
Charles L. A. Clarke, Ian Soboroff, and Nick Craswell. Overview of the TREC 2004 terabyte track. In E. M. Voorhees and L. P. Buckland, editors, Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004), Gaithersburg, MD, November 2004.
[12]
C. W. Cleverdon. The cranfield tests on index langauge devices. In Aslib Proceedings, volume 19, pages 173--192, 1967.
[13]
Gordon V. Cormack, Christopher R. Palmer, and Charles L. A. Clarke. Efficient construction of large test collections. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98) {1}, pages 282--289.
[14]
Dennis Fetterly, Mark Manasse, Marc Najork, and Janet L. Wiener. A large-scale study of the evolution of web pages. Software Practice and Experience, 34:213--237, 2004.
[15]
Alexandros Ntoulas, Junghoo Cho, and Christopher Olston. What's new on the web? the evolution of the web from a search engine perspective. In Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pages 1--12, New York, May 2004.
[16]
Mark Sanderson and Justin Zobel. Information retrieval system evaluation: Effort, sensitivity, and reliability. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), pages 162--169, Salvador, Brazil, August 2005. ACM Press.
[17]
Ian Soboroff. On evaluating web search with very few relevant documents. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004) {2}, pages 530--531.
[18]
Ian Soboroff and Stephen Robertson. Building a filtering test collection for TREC 2002. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), pages 243--250, Toronto, Canada, July 2003. ACM Press.
[19]
Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98) {1}.
[20]
Ellen M. Voorhees and Chris Buckley. The effect of topic set size on retrieval experiment error. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pages 316--323, Tampere, Finland, August 2002. ACM Press.
[21]
Ellen M. Voorhees and Donna~K. Harman, editors. TREC: Experiments in Information Retrieval Evaluation. MIT Press, 2005.
[22]
Justin Zobel. How reliable are the results of large-scale retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98) {1}, pages 307--314.

Cited By

View all
  • (2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
  • (2024)Retrogressive Document Manipulation of US Federal Environmental WebsitesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679988(3762-3766)Online publication date: 21-Oct-2024
  • (2024)Replicability Measures for Longitudinal Information Retrieval EvaluationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_16(215-226)Online publication date: 14-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
August 2006
768 pages
ISBN:1595933697
DOI:10.1145/1148170
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. retrieval test collections

Qualifiers

  • Article

Conference

SIGIR06
Sponsor:
SIGIR06: The 29th Annual International SIGIR Conference
August 6 - 11, 2006
Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
  • (2024)Retrogressive Document Manipulation of US Federal Environmental WebsitesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679988(3762-3766)Online publication date: 21-Oct-2024
  • (2024)Replicability Measures for Longitudinal Information Retrieval EvaluationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_16(215-226)Online publication date: 14-Sep-2024
  • (2023)LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search EvaluationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591921(3086-3094)Online publication date: 19-Jul-2023
  • (2022)Embedding and generalization of formula with context in the retrieval of mathematical informationJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2021.05.01434:9(6624-6634)Online publication date: Oct-2022
  • (2021)Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot SystemsExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-85251-1_8(91-102)Online publication date: 14-Sep-2021
  • (2018)When Rank Order Isn't EnoughProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271751(397-406)Online publication date: 17-Oct-2018
  • (2018)Search Engine MetricsEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_325(3328-3333)Online publication date: 7-Dec-2018
  • (2017)Search Engine MetricsEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_325-2(1-6)Online publication date: 5-Jan-2017
  • (2016)Metamorphic Testing for Software Quality Assessment: A Study of Search EnginesIEEE Transactions on Software Engineering10.1109/TSE.2015.247800142:3(264-284)Online publication date: 1-Mar-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media