Paper
7 March 1996 Document zone classification using sizes of connected components
Jisheng Liang, Ihsin T. Phillips, Jaekyu Ha, Robert M. Haralick
Author Affiliations +
Proceedings Volume 2660, Document Recognition III; (1996) https://doi.org/10.1117/12.234719
Event: Electronic Imaging: Science and Technology, 1996, San Jose, CA, United States
Abstract
In this paper, we describe a feature based supervised zone classifier using only the knowledge of the widths and the heights of the connected-components within a given zone. The distribution of the widths and the heights of the connected-components is encoded into a n multiplied by m dimensional vector in the decision making. Thus, the computational complexity is in the order of the number of connected-components within the given zone. A binary decision tree is used to assign a zone class on the basis of its feature vector. The training and testing data sets for the algorithm are drawn from the scientific document pages in the UW-I database. The classifier is able to classify each given scientific and technical document zone into one of the eight labels: text of font size 8-12, text of font size 13-18, text of font size 19-36, display math, table, halftone, line drawing, and ruling, in real time. The classifier is able to discriminate text from non-text with an accuracy greater than 97%.
© (1996) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jisheng Liang, Ihsin T. Phillips, Jaekyu Ha, and Robert M. Haralick "Document zone classification using sizes of connected components", Proc. SPIE 2660, Document Recognition III, (7 March 1996); https://doi.org/10.1117/12.234719
Lens.org Logo
CITATIONS
Cited by 9 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Halftones

Mathematics

Databases

Image classification

Image processing

Error analysis

Image analysis

Back to Top