short-paper

Dynamic Sampling Meets Pooling

Authors:

Amira GhenaiAuthors Info & Claims

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1217 - 1220

https://doi.org/10.1145/3331184.3331354

Published: 18 July 2019 Publication History

Get Access

Abstract

A team of six assessors used Dynamic Sampling (Cormack and Grossman 2018) and one hour of assessment effort per topic to form, without pooling, a test collection for the TREC 2018 Common Core Track. Later, official relevance assessments were rendered by NIST for documents selected by depth-10 pooling augmented by move-to-front (MTF) pooling (Cormack et al. 1998), as well as the documents selected by our Dynamic Sampling effort. MAP estimates rendered from dynamically sampled assessments using the xinfAP statistical evaluator are comparable to those rendered from the complete set of official assessments using the standard trec_eval tool. MAP estimates rendered using only documents selected by pooling, on the other hand, differ substantially. The results suggest that the use of Dynamic Sampling without pooling can, for an order of magnitude less assessment effort, yield information-retrieval effectiveness estimates that exhibit lower bias, lower error, and comparable ability to rank system effectiveness.

References

[1]

Abualsaud, M., Ghelani, N., Zhang, H., Smucker, M. D., Cormack, G. V., and Grossman, M. R. A system for efficient high-recall retrieval. In SIGIR 2018.

Digital Library

Google Scholar

[2]

Allan, J., Harman, D., Kanoulas, E., and Voorhees, E. TREC 2018 Common Core Track overview. In TREC 2018.

Google Scholar

[3]

Cormack, G. V., and Grossman, M. R. Beyond pooling. In SIGIR 2018.

Digital Library

Google Scholar

[4]

Cormack, G. V., and Grossman, M. R. Engineering quality and reliability in technology-assisted review. In SIGIR 2016.

Digital Library

Google Scholar

[5]

Cormack, G. V., and Grossman, M. R. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In SIGIR 2014.

Digital Library

Google Scholar

[6]

Cormack, G. V., and Grossman, M. R. Unbiased low-variance estimators for precision and related information retrieval effectiveness measures. In SIGIR 2019 (2019).

Digital Library

Google Scholar

[7]

Cormack, G. V., Palmer, C. R., and Clarke, C. L. Efficient construction of large test collections. In SIGIR 1998.

Digital Library

Google Scholar

[8]

Roegiest, A., and Cormack, G. V. Impact of review-set selection on human assessment for text classification. In SIGIR 2016.

Digital Library

Google Scholar

[9]

Sanderson, M., et al. Test collection based evaluation of information retrieval systems. Foundations and Trends® in Information Retrieval 4, 4 (2010).

Crossref

Google Scholar

[10]

Yilmaz, E., Aslam, J. A., and Robertson, S. A new rank correlation coefficient for information retrieval. In SIGIR 2008.

Digital Library

Google Scholar

[11]

Yilmaz, E., Kanoulas, E., and Aslam, J. A. A simple and efficient sampling method for estimating AP and NDCG. In SIGIR 2008.

Digital Library

Google Scholar

[12]

Zhang, H., Abualsaud, M., Ghelani, N., Smucker, M. D., Cormack, G. V., and Grossman, M. R. Effective user interaction for high-recall retrieval: Less is more. In CIKM 2018.

Digital Library

Google Scholar

Cited By

View all

Mayfield JYang ELawrie DMacAvaney SMcNamee POard DSoldaini LSoboroff IWeller OKayi ESanders KMason MHibbler NHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)On the Evaluation of Machine-Generated ReportsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657846(1904-1915)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657846
Lawrie DMayfield JOard DYang ENair SGaluščáková PChen HDuh WHuang HKato MMothe JPoblete B(2023)HC3: A Suite of Test Collections for CLIR Evaluation over Informal TextProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591893(2880-2889)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591893
Lawrie DMayfield JOard DYang E(2022)HC4: A New Suite of Test Collections for Ad Hoc CLIRAdvances in Information Retrieval10.1007/978-3-030-99736-6_24(351-366)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99736-6_24
Show More Cited By

Index Terms

Dynamic Sampling Meets Pooling
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness
      2. Test collections

Recommendations

Beyond Pooling
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Dynamic Sampling is a novel, non-uniform, statistical sampling strategy in which documents are selected for relevance assessment based on the results of prior assessments. Unlike static and dynamic pooling methods that are commonly used to compile ...
Pooling semilattices and non-adaptive pooling designs

In Huang and Weng (2004), Huang and Weng introduced pooling spaces, and constructed pooling designs from a pooling space. In this paper, we introduce the concept of pooling semilattices and prove that a pooling semilattice is a pooling space, then show ...
Static versus dynamic sampling for data mining
KDD'96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining

As data warehouses grow to the point where one hundred gigabytes is considered small, the computational efficiency of data-mining algorithms on large databases becomes increasingly important. Using a sample from the database can speed up the data-mining ...

Comments

Information & Contributors

Information

Published In

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2019

1512 pages

ISBN:9781450361729

DOI:10.1145/3331184

General Chairs:
Benjamin Piwowarski
CNRS - Sorbonne Universite, France
,
Max Chevalier
Universite de Toulouse, CNRS, France
,
Eric Gaussier
Universite Grenoble Alpes, CNRS, France
,
Program Chairs:
Yoelle Maarek
Amazon Research, Israel
,
Jian-Yun Nie
University of Montreal, Canada
,
Falk Scholer
RMIT University, Australia

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2019

Check for updates

Qualifiers

Short-paper

Funding Sources

Natural Sciences and Engineering Research Council of Canada

Conference

SIGIR '19

Sponsor:

SIGIR

SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 21 - 25, 2019

Paris, France

Acceptance Rates

SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
214
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Mayfield JYang ELawrie DMacAvaney SMcNamee POard DSoldaini LSoboroff IWeller OKayi ESanders KMason MHibbler NHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)On the Evaluation of Machine-Generated ReportsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657846(1904-1915)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657846
Lawrie DMayfield JOard DYang ENair SGaluščáková PChen HDuh WHuang HKato MMothe JPoblete B(2023)HC3: A Suite of Test Collections for CLIR Evaluation over Informal TextProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591893(2880-2889)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591893
Lawrie DMayfield JOard DYang E(2022)HC4: A New Suite of Test Collections for Ad Hoc CLIRAdvances in Information Retrieval10.1007/978-3-030-99736-6_24(351-366)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99736-6_24
Cormack GGrossman MPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Quantifying Bias and Variance of System RankingsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331356(1089-1092)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331356

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Beyond Pooling

Pooling semilattices and non-adaptive pooling designs

Static versus dynamic sampling for data mining

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations