Document zone classification using sizes of connected components

Jisheng Liang; Ihsin T. Phillips; Jaekyu Ha; Robert M. Haralick

doi:10.1117/12.234719

7 March 1996 Document zone classification using sizes of connected components

Jisheng Liang, Ihsin T. Phillips, Jaekyu Ha, Robert M. Haralick

Proceedings Volume 2660, Document Recognition III; (1996) https://doi.org/10.1117/12.234719
Event: Electronic Imaging: Science and Technology, 1996, San Jose, CA, United States

Abstract

In this paper, we describe a feature based supervised zone classifier using only the knowledge of the widths and the heights of the connected-components within a given zone. The distribution of the widths and the heights of the connected-components is encoded into a n multiplied by m dimensional vector in the decision making. Thus, the computational complexity is in the order of the number of connected-components within the given zone. A binary decision tree is used to assign a zone class on the basis of its feature vector. The training and testing data sets for the algorithm are drawn from the scientific document pages in the UW-I database. The classifier is able to classify each given scientific and technical document zone into one of the eight labels: text of font size 8-12, text of font size 13-18, text of font size 19-36, display math, table, halftone, line drawing, and ruling, in real time. The classifier is able to discriminate text from non-text with an accuracy greater than 97%.

Citation Download Citation

Jisheng Liang, Ihsin T. Phillips, Jaekyu Ha, and Robert M. Haralick "Document zone classification using sizes of connected components", Proc. SPIE 2660, Document Recognition III, (7 March 1996); https://doi.org/10.1117/12.234719

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available