Elsevier

Pattern Recognition Letters

Volume 16, Issue 9, September 1995, Pages 955-962
Pattern Recognition Letters

Classification of document blocks using density feature and connectivity histogram

https://doi.org/10.1016/0167-8655(95)00039-JGet rights and content

Abstract

In this paper, we present a document block classification algorithm to automatically classify different types of blocks embedded in a document image. Two kinds of features, density feature and connectivity histogram, are devised to achieve the classification goal. In our approach, segmented document blocks are first classified into text and non-text blocks via the density feature. Then, the connectivity histogram is utilized to further classify non-text blocks into image and graphics blocks. Experimental results reveal the feasibility of the new technique in classifying document blocks.

References (8)

There are more references available in the full text version of this article.

Cited by (17)

  • An automatic document processing system for medical data extraction

    2015, Measurement: Journal of the International Measurement Confederation
    Citation Excerpt :

    Indeed, confusing one exam for another can be dangerous in this kind of application. As regards the general aspects of the processing flow, it should be noted that the extraction and classification of data requires a lot of automatic decisions about where data are located and what is their meaning, and each decision may have deep consequences on subsequent processing [35]. This is in contrast with other industrial applications, where the authors have experienced less correlation between subsequent processing steps [37–41].

  • Parameter-free based two-stage method for binarizing degraded document images

    2012, Pattern Recognition
    Citation Excerpt :

    In the past decades, many document image operations, such as document layout analyzing [1,2], data hiding [3], skew estimation [4,5], stroke extraction [6], document block classification [7], and optical character recognition [8], have been developed for storing, transmitting, and managing digital documents.

  • Document zone content classification and its performance evaluation

    2006, Pattern Recognition
    Citation Excerpt :

    Based on the description of white space inside regions, it classified a given region into text, graphics, line art regions. Fan and Wang [24] presented a document block classification algorithm using density feature and connectivity histogram. The attribute of each segmented block is divided into three classes: text, graphics, and image.

View all citing articles on Scopus

This work is supported by National Science Council of Taiwan under grant NSC-83-0408-E-008-001.

View full text