Error detection in mechanized classification systems

doi:10.1016/0306-4573(76)90052-2

Information Processing & Management

Volume 12, Issue 5, 1976, Pages 333-337

https://doi.org/10.1016/0306-4573(76)90052-2 Get rights and content

Abstract

When documentary material is indexed by a mechanized classification system, and the results judged by trained professionals, the number of documents in disagreement, after suitable adjustment, defines the error rate of the system. In a test case disagreement was 22% and, of this 22%, the computer correctly identified two thirds of the decisions as doubtful. Professional examination of this doubtful group could further improve performance. The characteristics of the classification system, and of the material being classified, are mainly responsible for disagreement, and the size of the computer-identified, doubtful, group is a basic measure of the suitability of the system for the test material being processed. If is further suggested that if two professionals were compared on the same material then their disagreements would be mainly over the same documents.

References (7)

W.G. Hoyle
H.P. Luhn
A statistical approach to mechanized encoding and searching of literary information
IBM J. Res. Devel.
(1957)
A. Birnbaum et al.
Classification procedures based on Bayes' formula
Appl. Statis.
(1960)

There are more references available in the full text version of this article.

Cited by (1)

NDX-100: An Electronic Filing Machine for the Office of the Future
1981, Computer

View full text

Information Processing & Management

Error detection in mechanized classification systems

Abstract

A statistical approach to mechanized encoding and searching of literary information

IBM J. Res. Devel.

Classification procedures based on Bayes' formula

Appl. Statis.

NDX-100: An Electronic Filing Machine for the Office of the Future