poster

Optimizing the cost of information retrieval testcollections

Authors:

Mehdi Hosseini,

Ingemar Cox,

Natasa Milic-FraylingAuthors Info & Claims

PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management

Pages 79 - 82

https://doi.org/10.1145/2065003.2065020

Published: 28 October 2011 Publication History

Get Access

Abstract

We consider the problem of optimally allocating limited resources to construct relevance judgements for a test collection that facilities reliable evaluation of retrieval systems. We assume that there is a large set of test queries, for each of which a large number of documents need to be judged though the available budget only permits to judge a subset of them. A candidate solution to this problem has to deal with, at least, three challenges. (i) Given a fixed budget it has to efficiently select a subset of query-documents pairs for acquiring relevance judgements. (ii) With collected relevance judgements it has to be able to not only accurately evaluate a set of systems participating in a test collection construction but also reliably assess the performance of new as yet unseen systems. (iii) Finally, it has to properly deal with uncertainty that is due to (a) the presence of unjudged documents in a rank list, (b) the presence of queries with no relevance judgements, and (c) errors caused by human assessors when labelling documents. In this thesis we propose an optimisation framework that accommodates appropriate solutions for each of the three challenges. Our approach is aimed to be of benefit to construct IR test collections by research institutes, e.g. NIST, or commercial search engines, e.g. Google and Bing, where there are large scale documents collections and loads of query logs however economic constraints prohibit gathering comprehensive relevance judgements.

References

[1]

J. Zobel, "How reliable are the results of large-scale information retrieval experiments," in Proceeding of ACM SIGIR Special Interest Group on Information Retrieval, 1998, pp. 307--314.

Digital Library

Google Scholar

[2]

E. Yilmaz and J. Aslam, "Estimating average precision with incomplete and imperfect judgments," in Proceedings of the 15th ACM international conference on Information and knowledge management, 2006, pp. 102--111.

Digital Library

Google Scholar

[3]

B. Carterette, V. Pavlu, F. Hui, and E. Kanoulas, "Million Query Track 2009 Overview," TREC'09, Text Retrieval Conference, 2009.

Google Scholar

[4]

J. Guiver, S. Mizzaro, and S. Robertson, "A few good topics: Experiments in topic set reduction for retrieval evauluation," ACM Transactions of Information Systems, vol. 27, no. 4, pp. 1--26, 2009.

Digital Library

Google Scholar

[5]

S. Robertson, "On the contributions of topics to system evaluation," in Advances in Information Retrieval, 33th European Conference on IR Research (to appear), 2011.

Digital Library

Google Scholar

[6]

M. Hosseini, I. J. Cox, N. Milic-Frayling, T. Sweeting, and V. Vinay, "Prioritizing Relevance Judgments to Improve the Construction of IR Test Collections," in 20th ACM Conference on Information and Knowledge Management, Glasgow, 2011.

Digital Library

Google Scholar

[7]

M. Hosseini, I. Cox, N. Milic-Frayling, V. Vinay, and T. Sweeting, "Selecting a Subset of Queires for Acquisition of further Relevance Judgements," in 3rd international Conference on the Theory of Information Retrieval, 2011, pp. 113--124.

Digital Library

Google Scholar

[8]

H. P. Benson, "Fractional programming with convex quadratic forms and functions," European Journal of Operational Research, vol. 173, no. 2, pp. 351--369, Sep. 2006.

Crossref

Google Scholar

[9]

R. W. Cottle, J.-S. Pang, and R. E. Stone, The linear complementarity problem. Boston, London: Academic Press Inc, 1992.

Google Scholar

[10]

B. Carterette, E. Gabrilovich, V. Josifovski, and D. Metzler, "Measuring the Reusbility of Test Collections," in Proceeding of ACM International conference on Web Search and Data Mining, New York, 2010, pp. 231--240.

Digital Library

Google Scholar

Cited By

View all

Kanoulas E(2016)A Short Survey on Online and Offline Methods for Search Quality EvaluationInformation Retrieval10.1007/978-3-319-41718-9_3(38-87)Online publication date: 26-Jul-2016
https://doi.org/10.1007/978-3-319-41718-9_3
Nica ASuchanek FVarde A(2015)New Research Directions in Knowledge Discovery and Allied SpheresACM SIGKDD Explorations Newsletter10.1145/2783702.278370816:2(46-49)Online publication date: 21-May-2015
https://dl.acm.org/doi/10.1145/2783702.2783708
Nica ASuchanek F(2011)PIKM 2011Proceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2064049(2633-2634)Online publication date: 24-Oct-2011
https://dl.acm.org/doi/10.1145/2063576.2064049

Index Terms

Optimizing the cost of information retrieval testcollections
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals

Recommendations

Prioritizing relevance judgments to improve the construction of IR test collections
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

We consider the problem of optimally allocating a fixed budget to construct a test collection with associated relevance judgements, such that it can (i) accurately evaluate the relative performance of the participating systems, and (ii) generalize to ...
Estimating Measurement Uncertainty for Information Retrieval Effectiveness Metrics
Special Issue on Reproducibility in IR: Evaluation Campaigns, Collections and Analyses

One typical way of building test collections for offline measurement of information retrieval systems is to pool the ranked outputs of different systems down to some chosen depth d and then form relevance judgments for those documents only. Non-pooled ...
NTCIR Lifelog: The First Test Collection for Lifelog Research
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Test collections have a long history of supporting repeatable and comparable evaluation in Information Retrieval (IR). However, thus far, no shared test collection exists for IR systems that are designed to index and retrieve multimodal lifelog data. In ...

Comments

Information & Contributors

Information

Published In

PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management

October 2011

100 pages

ISBN:9781450309530

DOI:10.1145/2065003

Program Chairs:
Anisoara Nica
Sybase, An SAP Company, Canada
,
Fabian M. Suchanek
INRIA, France

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CIKM '11

Sponsor:

CIKM '11: International Conference on Information and Knowledge Management

October 28, 2011

Glasgow, Scotland, UK

Acceptance Rates

Overall Acceptance Rate 25 of 62 submissions, 40%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
91
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kanoulas E(2016)A Short Survey on Online and Offline Methods for Search Quality EvaluationInformation Retrieval10.1007/978-3-319-41718-9_3(38-87)Online publication date: 26-Jul-2016
https://doi.org/10.1007/978-3-319-41718-9_3
Nica ASuchanek FVarde A(2015)New Research Directions in Knowledge Discovery and Allied SpheresACM SIGKDD Explorations Newsletter10.1145/2783702.278370816:2(46-49)Online publication date: 21-May-2015
https://dl.acm.org/doi/10.1145/2783702.2783708
Nica ASuchanek F(2011)PIKM 2011Proceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2064049(2633-2634)Online publication date: 24-Oct-2011
https://dl.acm.org/doi/10.1145/2063576.2064049

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Prioritizing relevance judgments to improve the construction of IR test collections

Estimating Measurement Uncertainty for Information Retrieval Effectiveness Metrics

NTCIR Lifelog: The First Test Collection for Lifelog Research

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations