Paper
7 January 1999 Learning to identify hundreds of flex-form documents
Janusz Wnek
Author Affiliations +
Proceedings Volume 3651, Document Recognition and Retrieval VI; (1999) https://doi.org/10.1117/12.335815
Event: Electronic Imaging '99, 1999, San Jose, CA, United States
Abstract
This paper presents an inductive document classifier (IDC) and its application to document identification. The most important features of the presented system are learning capability, handling large volumes of highly variant documents, and high performance. IDC learns new document types (variants) from examples. To this end, it automatically extracts discriminatory features from images of various document types, generates generalized descriptions, and stores them in the knowledge base. The classification of an unknown document is based on matching its description to all general rules in the knowledge base, and selecting the best matching document types as final classifications. Both learning and identification processes are fast and accurate. The speed is gained due to optimal image processing and feature construction procedures. Identification accuracy is very high despite the fact that the discriminatory features are generated solely based on page layout information. IDC operates in two separate components of an EDMS: Knowledge Base Maintainer (KBM) and Production Identifier (PI). KBM builds a knowledge base and maintains its integrity. PI utilizes learned knowledge during the identification processes.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Janusz Wnek "Learning to identify hundreds of flex-form documents", Proc. SPIE 3651, Document Recognition and Retrieval VI, (7 January 1999); https://doi.org/10.1117/12.335815
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications and 9 patents.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image processing

Classification systems

Computing systems

Databases

Document management

Optical character recognition

Remote sensing

RELATED CONTENT


Back to Top