short-paper

Are Secondary Assessors Uncertain When They Disagree About Relevance Judgements?

Authors:
Aiman L. Al-Harbi

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Mark D. Smucker

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

CHIIR '16: Proceedings of the 2016 ACM on Conference on Human Information Interaction and RetrievalMarch 2016Pages 233–236https://doi.org/10.1145/2854946.2854993

Published:13 March 2016Publication History

CHIIR '16: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval

Pages 233–236

ABSTRACT

The collection of relevance judgements by assessors is important for many information retrieval (IR) tasks. In addition to the construction of test collections, relevance judging is critical to e-discovery and other applications where many assessors are hired to perform relevance judging. It is well known that assessors may differ in their judgements for a given document. One possible cause of a judgement difference is that an assessor may be uncertain in their judgement and thus may in effect be guessing the document's relevance. If assessors are aware of their uncertainty and can self-report their level of certainty, then uncertain relevance judgements can be targeted for adjudication by additional assessors. In this paper, we conducted a user study with 48 participants to test our hypothesis that assessors will be uncertain about their relevance judgements when the assessors are likely to disagree with each other. We found that for low consensus documents, i.e. documents known for assessor disagreement, assessors judge these documents with almost as much certainty as high consensus documents. In particular, assessor self-reported uncertainty is predictive of disagreement only for high consensus documents and not for low consensus documents.

References

A. L. Al-Harbi and M. D. Smucker. A Qualitative Exploration of Secondary Assessor Relevance Judging Behavior. In IIiX, pages 195--204, 2014. Google ScholarDigital Library
P. Bailey, N. Craswell, I. Soboroff, P. Thomas, A. P. de Vries, and E. Yilmaz. Relevance assessment: are judges exchangeable and does it matter. In SIGIR, pages 667--674, 2008. Google ScholarDigital Library
C. W. Cleverdon. The Effect of Variations in Relevance Assessments in Comparative Experimental Tests of Index Languages. Tech. report, Cranfield Univ.; Aslib, 1970.Google Scholar
C. Jethani. Effect of Prevalence on Relevance Assessing Behavior. Master's Thesis, University of Waterloo, 2011.Google Scholar
M. E. Lesk and G. Salton. Relevance Assessments and Retrieval System Evaluation. Information Storage and Retrieval, 4(4):343--359, 1968.Google ScholarCross Ref
M. D. Smucker and C. Jethani. Human Performance and Retrieval Precision Revisited. In SIGIR, pages 595--602, 2010. Google ScholarDigital Library
M. D. Smucker and C. Jethani. The Crowd vs. the Lab: A Comparison of Crowd-Sourced and University Laboratory Participant Behavior. In Proceedings of the SIGIR 2011 Workshop on Crowdsourcing for Information Retrieval, 2011.Google Scholar
E. M. Voorhees. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. Information Processing & Management, 36(5):697--716, 2000. Google ScholarDigital Library
E. M. Voorhees. Overview of the TREC 2005 Robust Retrieval Track. In TREC, 2005.Google Scholar
J. Wang and D. Soergel. A User Study of Relevance Judgments for E-discovery. In ASIST, 47(1):1--10, 2010. Google ScholarDigital Library
W. Webber, P. Chandar, and B. Carterette. Alternative Assessor Disagreement and Retrieval Depth. In CIKM, pages 125--134, 2012. Google ScholarDigital Library

Index Terms

Are Secondary Assessors Uncertain When They Disagree About Relevance Judgements?
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
  2. Information storage systems

Recommendations

A qualitative exploration of secondary assessor relevance judging behavior
IIiX '14: Proceedings of the 5th Information Interaction in Context Symposium

Secondary assessors frequently differ in their relevance judgments. Primary assessors are those that originate a search topic and whose judgments truly reflect the assessor's relevance criteria. Secondary assessors do not originate the search and must ...
Read More
A document rating system for preference judgements
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

High quality relevance judgments are essential for the evaluation of information retrieval systems. Traditional methods of collecting relevance judgments are based on collecting binary or graded nominal judgments, but such judgments are limited by ...
Read More
Time to judge relevance as an indicator of assessor error
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

When human assessors judge documents for their relevance to a search topic, it is possible for errors in judging to occur. As part of the analysis of the data collected from a 48 participant user study, we have discovered that when the participants made ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHIIR '16: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval
March 2016
400 pages
ISBN:9781450337519
DOI:10.1145/2854946
General Chairs:
Diane Kelly
University of North Carolina at Chapel Hill, USA
,
Rob Capra
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Nick Belkin
Rutgers University, USA
,
Jaime Teevan
Microsoft Research, USA
,
Pertti Vakkari
University of Tampere, Finland
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 March 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evaluation
relevance judging
search
Qualifiers
- short-paper
Conference

Acceptance Rates
CHIIR '16 Paper Acceptance Rate23of58submissions,40%Overall Acceptance Rate55of163submissions,34%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 123
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Are Secondary Assessors Uncertain When They Disagree About Relevance Judgements?

CHIIR '16: Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A qualitative exploration of secondary assessor relevance judging behavior

A document rating system for preference judgements

Time to judge relevance as an indicator of assessor error