Error detection in mechanized classification systems

https://doi.org/10.1016/0306-4573(76)90052-2Get rights and content

Abstract

When documentary material is indexed by a mechanized classification system, and the results judged by trained professionals, the number of documents in disagreement, after suitable adjustment, defines the error rate of the system. In a test case disagreement was 22% and, of this 22%, the computer correctly identified two thirds of the decisions as doubtful. Professional examination of this doubtful group could further improve performance. The characteristics of the classification system, and of the material being classified, are mainly responsible for disagreement, and the size of the computer-identified, doubtful, group is a basic measure of the suitability of the system for the test material being processed. If is further suggested that if two professionals were compared on the same material then their disagreements would be mainly over the same documents.

References (7)

  • W.G. Hoyle
  • H.P. Luhn

    A statistical approach to mechanized encoding and searching of literary information

    IBM J. Res. Devel.

    (1957)
  • A. Birnbaum et al.

    Classification procedures based on Bayes' formula

    Appl. Statis.

    (1960)
There are more references available in the full text version of this article.
View full text