Paper
21 December 2000 Evaluating text categorization in the presence of OCR errors
Kazem Taghva, Thomas A. Nartker, Julie Borsack, Steven Lumos, Allen Condit, Ron Young
Author Affiliations +
Proceedings Volume 4307, Document Recognition and Retrieval VIII; (2000) https://doi.org/10.1117/12.410861
Event: Photonics West 2001 - Electronic Imaging, 2001, San Jose, CA, United States
Abstract
In this paper we describe experiments that investigate the effects of OCR errors on text categorization. In particular, we show that in our environment, OCR errors have no effect on categorization when we use a classifier based on the naive Bayes model. We also observe that dimensionality reduction techniques eliminate a large number of OCR errors and improve categorization results.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kazem Taghva, Thomas A. Nartker, Julie Borsack, Steven Lumos, Allen Condit, and Ron Young "Evaluating text categorization in the presence of OCR errors", Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); https://doi.org/10.1117/12.410861
Lens.org Logo
CITATIONS
Cited by 22 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Machine learning

Rule based systems

Error analysis

Lanthanum

Licensing

Pollution control

RELATED CONTENT

Date of birth extraction using precise shallow parsing
Proceedings of SPIE (January 18 2010)
Effectiveness of thesauri-aided retrieval
Proceedings of SPIE (January 07 1999)
Do Thesauri enhance rule-based categorization for OCR text?
Proceedings of SPIE (January 13 2003)
Evaluation of document image skew estimation techniques
Proceedings of SPIE (March 07 1996)
Title extraction and generation from OCR'd documents
Proceedings of SPIE (January 29 2007)

Back to Top