short-paper

Unbiased Low-Variance Estimators for Precision and Related Information Retrieval Effectiveness Measures

Authors:
Gordon V. Cormack

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Maura R. Grossman

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2019Pages 945–948https://doi.org/10.1145/3331184.3331355

Published:18 July 2019Publication History

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 945–948

ABSTRACT

This work describes an estimator from which unbiased measurements of precision, rank-biased precision, and cumulative gain may be derived from a uniform or non-uniform sample of relevance assessments. Adversarial testing supports the theory that our estimator yields unbiased low-variance measurements from sparse samples, even when used to measure results that are qualitatively different from those returned by known information retrieval methods. Our results suggest that test collections using sampling to select documents for relevance assessment yield more accurate measurements than test collections using pooling, especially for the results of retrieval methods not contributing to the pool.

References

Aslam, J. A., Pavlu, V., and Savell, R. A unified model for metasearch and the efficient evaluation of retrieval systems via the hedge algorithm. In SIGIR 2003. Google ScholarDigital Library
Cormack, G. V., and Grossman, M. R. Beyond pooling. In SIGIR 2018. Google ScholarDigital Library
Horvitz, D. G., and Thompson, D. J. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47, 260 (1952), 663--685.Google ScholarCross Ref
Pavlu, V., and Aslam, J. A practical sampling strategy for efficient retrieval evaluation. Northeastern University (2007).Google Scholar
Sanderson, M., et al. Test collection based evaluation of information retrieval systems. Foundations and Trends in Information Retrieval 4, 4 (2010), 247--375.Google ScholarCross Ref
Voorhees, E., and Harman, D. Overview of the eighth text retrieval conference. In TREC 8 (1999).Google Scholar
Voorhees, E. M. The effect of sampling strategy on inferred measures. In SIGIR 2014. Google ScholarDigital Library
Yilmaz, E., Kanoulas, E., and Aslam, J. A. A simple and efficient sampling method for estimating AP and NDCG. In SIGIR 2008. Google ScholarDigital Library

Index Terms

Unbiased Low-Variance Estimators for Precision and Related Information Retrieval Effectiveness Measures
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness
      2. Test collections

Recommendations

Beyond Pooling
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Dynamic Sampling is a novel, non-uniform, statistical sampling strategy in which documents are selected for relevance assessment based on the results of prior assessments. Unlike static and dynamic pooling methods that are commonly used to compile ...
Read More
Ratio estimators for the population variance in simple and stratified random sampling

We propose some ratio-type variance estimators using ratio estimators for the population mean in literature. We obtain mean square error (MSE) equations of proposed estimators and show that proposed estimators are more efficient than the traditional ...
Read More
Assessing the Impact of Vocabulary Similarity on Multilingual Information Retrieval for Bantu Languages
FIRE '16: Proceedings of the 8th Annual Meeting of the Forum for Information Retrieval Evaluation

Despite the availability of massive open information and efforts to promote multilingualism on the Web, content in Bantu languages remains negligible. Additionally, Information Retrieval (IR) systems, such as the Google search engine, use algorithms ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2019
1512 pages
ISBN:9781450361729
DOI:10.1145/3331184
General Chairs:
Benjamin Piwowarski
CNRS - Sorbonne Universite, France
,
Max Chevalier
Universite de Toulouse, CNRS, France
,
Eric Gaussier
Universite Grenoble Alpes, CNRS, France
,
Program Chairs:
Yoelle Maarek
Amazon Research, Israel
,
Jian-Yun Nie
University of Montreal, Canada
,
Falk Scholer
RMIT University, Australia
Copyright © 2019 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2019
Check for updates
Author Tags
dynamic sampling
horvitz-thompson evaluator
nonuniform sampling
test collection
unbiased estimator
Qualifiers
- short-paper
Conference

Acceptance Rates
SIGIR'19 Paper Acceptance Rate84of426submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 208
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Unbiased Low-Variance Estimators for Precision and Related Information Retrieval Effectiveness Measures

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Beyond Pooling

Ratio estimators for the population variance in simple and stratified random sampling

Assessing the Impact of Vocabulary Similarity on Multilingual Information Retrieval for Bantu Languages