short-paper

Assessor disagreement and text classifier accuracy

Authors:
William Webber

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Jeremy Pickens

Catalyst Repository Systems, Denver, CO, USA

Catalyst Repository Systems, Denver, CO, USA
View Profile

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalJuly 2013Pages 929–932https://doi.org/10.1145/2484028.2484156

Published:28 July 2013Publication History

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pages 929–932

ABSTRACT

Text classifiers are frequently used for high-yield retrieval from large corpora, such as in e-discovery. The classifier is trained by annotating example documents for relevance. These examples may, however, be assessed by people other than those whose conception of relevance is authoritative. In this paper, we examine the impact that disagreement between actual and authoritative assessor has upon classifier effectiveness, when evaluated against the authoritative conception. We find that using alternative assessors leads to a significant decrease in binary classification quality, though less so ranking quality. A ranking consumer would have to go on average 25% deeper in the ranking produced by alternative-assessor training to achieve the same yield as for authoritative-assessor training.

References

Carla E. Brodley and Mark A. Friedl. Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11: 131--167, 1999.Google ScholarCross Ref
Ben Carterette and Ian Soboroff. The effect of assessor errors on IR system evaluation. In Proc. 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 539--546, Geneva, Switzerland, July 2010. Google ScholarDigital Library
Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2: 27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
Gordon V. Cormack, Maura R. Grossman, Bruce Hedin, and Douglas W. Oard. Overview of the TREC 2010 legal track. In Ellen Voorhees and Lori P. Buckland, editors, Proc. 19th Text REtrieval Conference, pages 1:2:1--45, Gaithersburg, Maryland, USA, November 2010.Google Scholar
Maura R. Grossman and Gordon V. Cormack. Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Richmond Journal of Law and Technology, 17 (3): 11:1--48, 2011.Google Scholar
John C. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Alexander J. Smola, Peter Bartlett, Bernhard Schölkopf, and Dale Schuurmans, editors, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 1999.Google Scholar
Ganesh Ramakrishnan, Krishna Prasad Chitrapura, Raghu Krishnapuram, and Pushpak Bhattarcharyy. A model for handling approximate, noisy or incomplete labeling in text classification. In Proc. 22nd International Conference on Machine Learning, pages 681--688, Bonn, Germany, August 2005. Google ScholarDigital Library
Ellen Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management, 36 (5): 697--716, September 2000. Google ScholarDigital Library
William Webber. Re-examining the effectiveness of manual review. In Proc. SIGIR Information Retrieval for E-Discovery Workshop, pages 2:1--8, Beijing, China, July 2011.Google Scholar

Index Terms

Assessor disagreement and text classifier accuracy
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Document features predicting assessor disagreement
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

The notion of relevance differs between assessors, thus giving rise to assessor disagreement. Although assessor disagreement has been frequently observed, the factors leading to disagreement are still an open problem. In this paper we study the ...
Read More
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Read More
Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation
Abstract
Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
July 2013
1188 pages
ISBN:9781450320344
DOI:10.1145/2484028
General Chairs:
Gareth J.F. Jones
Dublin City University, Ireland
,
Páraic Sheridan
Dublin City University, Ireland
,
Program Chairs:
Diane Kelly
University of North Carolina, Chapel Hill, USA
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Tetsuya Sakai
Microsoft Research Asia, China
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 July 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
assessor disagreement
evaluation
text classification
Qualifiers
- short-paper
Conference

Acceptance Rates
SIGIR '13 Paper Acceptance Rate73of366submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 222
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Assessor disagreement and text classifier accuracy

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Document features predicting assessor disagreement

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation