short-paper

Building Test Collections using Bandit Techniques: A Reproducibility Study

Authors:
Bahadir Altun

TOBB University of Economics and Technology, Ankara, Turkey

TOBB University of Economics and Technology, Ankara, Turkey
View Profile

,
Mucahid Kutlu

TOBB University of Economics and Technology, Ankara, Turkey

TOBB University of Economics and Technology, Ankara, Turkey
View Profile

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020Pages 1953–1956https://doi.org/10.1145/3340531.3412121

Published:19 October 2020Publication History

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 1953–1956

ABSTRACT

The high cost of constructing test collections led many researchers to develop intelligent document selection methods to find relevant documents with fewer judgments than the standard pooling method requires. In this paper, we conduct a comprehensive set of experiments to evaluate six bandit-based document selection methods, in terms of evaluation reliability, fairness, and reusability of the resultant test collections. In our experiments, the best performing method varies across test collections, showing the importance of using diverse test collections for an accurate performance analysis. Our experiments with six test collections also show that Move-To-Front is the most robust method among the ones we investigate.

Supplemental Material

3340531.3412121.mp4

mp4

15.6 MB

Download

References

Allan, J., Harman, D., Kanoulas, E., Li, D., Van Gysel, C., Voorhees, E.M.: Trec 2017 common core track overview. In: TREC (2017)Google Scholar
Aslam, J.A., Pavlu, V., Savell, R.: A unified model for metasearch, pooling, and system evaluation. In: Proceedings of the twelfth international conference on Information and knowledge management. pp. 484--491. ACM (2003)Google ScholarDigital Library
Carterette, B.: On rank correlation and the distance between rankings. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. pp. 436--443. ACM (2009)Google ScholarDigital Library
Cleverdon, C.W.: The evaluation of systems used in information retrieval. In: Proceedings of the international conference on scientific information. vol. 1, pp. 687--698. National Academy of Sciences Washington, DC, (1959)Google Scholar
Cormack, G.V., Palmer, C.R., Clarke, C.L.: Effcient construction of large test collections. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-98). Citeseer (1998)Google ScholarDigital Library
Lipani, A., Losada, D.E., Zuccon, G., Lupu, M.: Fixed-cost pooling strategies. IEEE Transactions on Knowledge and Data Engineering (2019)Google Scholar
Lipani, A., Palotti, J., Lupu, M., Piroi, F., Zuccon, G., Hanbury, A.: Fixed-cost pooling strategies based on ir evaluation measures. In: European Conference on Information Retrieval. pp. 357--368. Springer (2017)Google ScholarCross Ref
Losada, D.E., Parapar, J., Barreiro, A.: Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems. Information Processing & Management 53(5), 1005--1025 (2017)Google ScholarCross Ref
Moffat, A., Webber, W., Zobel, J.: Strategic system comparisons via targeted relevance judgments. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. pp. 375--382. ACM (2007)Google ScholarDigital Library
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) 27(1), 2 (2008)Google Scholar
Rahman, M.M., Kutlu, M., Lease, M.: Constructing test collections using multiarmed bandits and active learning. In: The World Wide Web Conference. pp. 3158--3164. ACM (2019)Google ScholarDigital Library
Sakai, T.: Topic set size design. Information Retrieval Journal 19(3), 256--283 (2016)Google ScholarDigital Library
Sparck Jones, K., Van Rijsbergen, C.: Report on the need for and provision of an" ideal. Information Retrieval Test Collection (1975)Google Scholar
Urbano, J., Marrero, M., Martín, D.: On the measurement of test collection reliability. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. pp. 393--402. ACM (2013).Google ScholarDigital Library
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Information processing & management 36(5), 697--716 (2000).Google Scholar
Voorhees, E.M.: The philosophy of information retrieval evaluation. In:Workshop of the cross-language evaluation forum for european languages. pp. 355--370 (2001)Google Scholar
Voorhees, E.M.: Topic set size redux. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. pp. 806--807. ACM (2009)Google ScholarDigital Library
Voorhees, E.M.: On building fair and reusable test collections using bandit techniques. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. pp. 407--416. ACM (2018)Google ScholarDigital Library

Index Terms

Building Test Collections using Bandit Techniques: A Reproducibility Study
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Test collections

Recommendations

Reusable test collections through experimental design
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Portable, reusable test collections are a vital part of research and development in information retrieval. Reusability is difficult to assess, however. The standard approach--simulating judgment collection when groups of systems are held out, then ...
Read More
On Building Fair and Reusable Test Collections using Bandit Techniques
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

While test collections are a vital piece of the research infrastructure for information retrieval, constructing fair, reusable test collections for large data sets is challenging because of the number of human relevance assessments required. Various ...
Read More
Dynamic Test Collections for Retrieval Evaluation
ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bandit methods
evaluation
information retrieval
test collections
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 217
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Building Test Collections using Bandit Techniques: A Reproducibility Study

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Reusable test collections through experimental design

On Building Fair and Reusable Test Collections using Bandit Techniques

Dynamic Test Collections for Retrieval Evaluation