poster

The effect of expanding relevance judgements with duplicates

Authors:

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Pages 1159 - 1162

https://doi.org/10.1145/2600428.2609534

Published: 03 July 2014 Publication History

Get Access

Abstract

We examine the effects of expanding a judged set of sentences with their duplicates from a corpus. Including new sentences that are exact duplicates of the previously judged sentences may allow for better estimation of performance metrics and enhance the reusability of a test collection. We perform experiments in context of the Temporal Summarization Track at TREC 2013. We find that adding duplicate sentences to the judged set does not significantly affect relative system performance. However, we do find statistically significant changes in the performance of nearly half the systems that participated in the Track. We recommend adding exact duplicate sentences to the set of relevance judgements in order to obtain a more accurate estimate of system performance.

References

[1]

KBA Stream Corpus 2013. http://trec-kba.org/kba-stream-corpus-2013.shtml.

Google Scholar

[2]

J. Aslam, F. Diaz, M. Ekstrand-Abueg, V. Pavlu, and T. Sakai. TREC 2013 Temporal Summarization. In TREC, 2013.

Google Scholar

[3]

C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In SIGIR, pages 25--32, 2004.

Digital Library

Google Scholar

[4]

G. Marton and A. Radul. Nuggeteer: Automatic nugget-based evaluation using descriptions and judgements. In HLT-NAACL, pages 375--382, 2006.

Digital Library

Google Scholar

[5]

V. Pavlu and J. Aslam. A practical sampling strategy for efficient retrieval evaluation, Technical Report, College of Computer and Information Science, Northeastern University. 2007.

Google Scholar

[6]

V. Pavlu, S. Rajput, P. B. Golbus, and J. A. Aslam. IR system evaluation using nugget-based test collections. In WSDM, pages 393--402, 2012.

Digital Library

Google Scholar

[7]

T. Sakai and N. Kando. On information retrieval metrics designed for evaluation with incomplete relevance assessments. Information Retrieval, 11(5):447--470, 2008.

Digital Library

Google Scholar

[8]

M. Sanderson and J. Zobel. Information retrieval system evaluation: effort, sensitivity, and reliability. In SIGIR, pages 162--169, 2005.

Digital Library

Google Scholar

[9]

I. Soboroff, C. Nicholas, and P. Cahan. Ranking retrieval systems without relevance judgments. In SIGIR, pages 66--73, 2001.

Digital Library

Google Scholar

[10]

A. Trotman and D. Jenkinson. IR evaluation using multiple assessors per topic. ADCS, 2007.

Google Scholar

[11]

E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In SIGIR, pages 315--323, 1998.

Digital Library

Google Scholar

[12]

E. M. Voorhees and C. Buckley. The effect of topic set size on retrieval experiment error. In SIGIR, pages 316--323, 2002.

Digital Library

Google Scholar

Cited By

View all

McCreadie RRajput SSoboroff IMacdonald COunis I(2019)On enhancing the robustness of timeline summarization test collectionsInformation Processing and Management: an International Journal10.1016/j.ipm.2019.02.00656:5(1815-1836)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.ipm.2019.02.006
Hasanain MSuwaileh RElsayed TKutlu MAlmerekhi H(2017)EveTAR: building a large-scale multi-task test collection over Arabic tweetsInformation Retrieval Journal10.1007/s10791-017-9325-721:4(307-336)Online publication date: 21-Dec-2017
https://doi.org/10.1007/s10791-017-9325-7
Baruah GZhang HGuttikonda RLin JSmucker MVechtomova OMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Optimizing Nugget Annotations with Active LearningProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983694(2359-2364)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983694
Show More Cited By

Index Terms

The effect of expanding relevance judgements with duplicates
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Are Secondary Assessors Uncertain When They Disagree About Relevance Judgements?
CHIIR '16: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval

The collection of relevance judgements by assessors is important for many information retrieval (IR) tasks. In addition to the construction of test collections, relevance judging is critical to e-discovery and other applications where many assessors are ...
A document rating system for preference judgements
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

High quality relevance judgments are essential for the evaluation of information retrieval systems. Traditional methods of collecting relevance judgments are based on collecting binary or graded nominal judgments, but such judgments are limited by ...
Relevance Judgments: Preferences, Scores and Ties
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conventionally, relevance judgments were assessed using ordinal relevance scales such as binary and Sormunen categories [9]. Such judgments record how much overlap there is between the document and the topic. However they have been argued as unreliable ...

Comments

Information & Contributors

Information

Published In

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

July 2014

1330 pages

ISBN:9781450322577

DOI:10.1145/2600428

General Chairs:
Shlomo Geva
Queensland University of Technology
,
Andrew Trotman
University of Dunedin
,
Program Chairs:
Peter Bruza
Queensland University of Technology
,
Charles L.A. Clarke
University of Waterloo
,
Kal Järvelin
University of Tampere

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

SIGIR '14

Sponsor:

SIGIR

SIGIR '14: The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 6 - 11, 2014

Queensland, Gold Coast, Australia

Acceptance Rates

SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
218
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

McCreadie RRajput SSoboroff IMacdonald COunis I(2019)On enhancing the robustness of timeline summarization test collectionsInformation Processing and Management: an International Journal10.1016/j.ipm.2019.02.00656:5(1815-1836)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.ipm.2019.02.006
Hasanain MSuwaileh RElsayed TKutlu MAlmerekhi H(2017)EveTAR: building a large-scale multi-task test collection over Arabic tweetsInformation Retrieval Journal10.1007/s10791-017-9325-721:4(307-336)Online publication date: 21-Dec-2017
https://doi.org/10.1007/s10791-017-9325-7
Baruah GZhang HGuttikonda RLin JSmucker MVechtomova OMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Optimizing Nugget Annotations with Active LearningProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983694(2359-2364)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983694
Ekstrand-Abueg MMcCreadie RPavlu VDiaz FMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)A Study of Realtime Summarization MetricsProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983653(2125-2130)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983653
Vuurens Jde Vries ABlanco RMika PAllan JCroft Bde Vries AZhai C(2015)Online News Tracking for Ad-Hoc Information NeedsProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809474(221-230)Online publication date: 27-Sep-2015
https://dl.acm.org/doi/10.1145/2808194.2809474
Baruah GSmucker MClarke CBaeza-Yates RLalmas MMoffat ARibeiro-Neto B(2015)Evaluating Streams of Evolving News EventsProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767751(675-684)Online publication date: 9-Aug-2015
https://dl.acm.org/doi/10.1145/2766462.2767751
Gebremeskel Gde Vries A(2015)Entity-Centric Stream Filtering and Ranking: Filtering and Unfilterable DocumentsAdvances in Information Retrieval10.1007/978-3-319-16354-3_33(303-314)Online publication date: 2015
https://doi.org/10.1007/978-3-319-16354-3_33

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Are Secondary Assessors Uncertain When They Disagree About Relevance Judgements?

A document rating system for preference judgements

Relevance Judgments: Preferences, Scores and Ties

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations