Rough-fuzzy clustering and multiresolution image analysis for text-graphics segmentation

doi:10.1016/j.asoc.2015.01.049

Applied Soft Computing

Volume 30, May 2015, Pages 705-721

https://doi.org/10.1016/j.asoc.2015.01.049 Get rights and content

Highlights

•
A new method is proposed for text-graphics segmentation.
•
M-band wavelet packet is used to extract scale-space features for document image.
•
Unsupervised feature selection method is proposed to select relevant and non-redundant features.
•
Rough-fuzzy clustering is used to address uncertainty problem of document segmentation.
•
The approach is invariant under font size of text, scanning resolution and type of layout.

Abstract

This paper presents a segmentation method, integrating judiciously the merits of rough-fuzzy computing and multiresolution image analysis technique, for documents having both text and graphics regions. It assumes that the text and non-text or graphics regions of a given document are considered to have different textural properties. The M-band wavelet packet analysis and rough-fuzzy-possibilistic c-means are used for text-graphics segmentation problem. The M-band wavelet packet is used to extract the scale-space features, which offers a huge range of possibilities of scale-space features for document image and is able to zoom it onto narrow band high frequency components. A scale-space feature vector is thus derived, taken at different scales for each pixel in an image. However, the decomposition scheme employing M-band wavelet packet leads to a large number of redundant features. In this regard, an unsupervised feature selection method is introduced to select a set of relevant and non-redundant features for text-graphics segmentation problem. Finally, the rough-fuzzy-possibilistic c-means algorithm is used to address the uncertainty problem of document segmentation. The whole approach is invariant under the font size, line orientation, and script of the text. The performance of the proposed technique, along with a comparison with related approaches, is demonstrated on a set of real life document images.

Graphical abstract

Introduction

With the advances in information technology, automated processing of documents has become an imperative need. As the world moves closer to the concept of the paperless office, more and more communication and storage of documents are performed digitally. In this background, there is a great demand for software that automatically extracts, analyzes, and stores information from physical documents for later retrieval. However, the documents in digitized form require a large amount of storage space, after being compressed using advanced techniques. Text-graphics segmentation partitions a document image into distinct regions corresponding to the text and non-text parts. In effect, it facilitates efficient searching and storage of text parts of the documents.

Many techniques have been proposed to segment the document image into text and non-text regions [1], [2]. Most popular among them are top-down and bottom-up approaches. The top-down techniques are based on the difference in contrast between the foreground and background to split the document into columns, paragraphs, text lines, and may be in words. Projection profiles [3], [4] are popular top-down approaches that work by identifying the white spaces by vertical and horizontal projections. On the other hand, bottom-up methods, which are similarity based document segmentation approaches, tend to cluster pixels with similar intensities to obtain higher level descriptions. The run length smoothing algorithm [5], [6] is an example of bottom-up approach, which applies region growing approach to detect text regions. The Docstrum proposed in [7] is another bottom-up method, which groups connected components of the same type using nearest neighbor information, starting from the pixel level to obtain higher level descriptions of the document such as words, text lines, paragraphs, and so on. Each of these methods assumes rectangular blocks of text and graphics, and is sensitive to different textural properties such as font size, text line orientation, and inter-character spacing. Hence, these methods are not effective when any a priori knowledge about the content and attributes of the document image is unavailable.

Nicolas et al. [8] extracted the text from document image using 2-D conditional random field model by integrating contextual knowledge and machine learning technique. Another approach for the binarization of a document image was proposed based on a Bayesian framework using Markov random field model of the image [9]. Junga et al. [10] achieved text segmentation by applying region growing procedure based on the response of stroke filter and then improved the segmentation by using an OCR feedback procedure. Chen and Wu [11] developed a document segmentation approach, which integrates multiplane segmentation and multilevel thresholding method. Su et al. [12] proposed a method to segment the degraded document image by combining contrast map and canny edge detector, followed by thresholding.

Recently, wavelet techniques have become powerful tools in document image analysis domain. It is particularly good at describing a scene in terms of the scale of textures in it. Li and Gray [13] have used features based on distribution characteristics of wavelet coefficients in high frequency bands to segment document images into four classes, namely, background, photograph, text, and graph. While Kundu and Acharyya [14] proposed a text-graphics segmentation method based on wavelet scale-space features followed by k-means clustering algorithm, Deng et al. [15] have used cubic b-spline wavelet for feature extraction and k-means for text-graphics segmentation. On the other hand, Lee et al. [16] proposed an algorithm, which is based on local energy estimation in wavelet packet domain and k-means algorithm. Kumar et al. [17] used globally matched wavelet filters and Markov random field model to segment the document images into text, background, and picture components. Haneda and Bouman [18] combined cost optimized segmentation and connected component classification into multiscale framework in order to improve the text-graphics segmentation accuracy for compression.

In this background, this paper presents a text-graphics segmentation method, integrating judiciously the merits of rough-fuzzy clustering and multiresolution image analysis. The proposed method decomposes a composite image into multiresolution multidirectional feature subbands using wavelet analysis. According to Chang and Kuo [19], most of the significant textural features are more prevalent in the intermediate frequency subbands [20]. In this regard, the M-band wavelet packet transform is used in the proposed method for feature extraction. It recursively decomposes both the high frequency and low frequency bands at each scale. However, the complete decomposition tree is usually not required by decomposing all the subbands at each scale. Hence, an appropriate method of selecting the significant and relevant features is required. Subsequently, features are computed from this set of selected bases by using nonlinear energy estimation followed by a smoothing filter. The use of M-band wavelet packet decomposition gives rise to a large number of features, which incurs redundancy. Therefore, selection of the appropriate features using some basis selection algorithms is required. Since any a priori knowledge about the image is not available, an unsupervised approach is proposed for feature selection. Finally, the rough-fuzzy-possibilistic c-means (RFPCM) algorithm [21] is used to segment the document image using feature vectors. It adds the concept of fuzzy membership (both probabilistic and possibilistic) of fuzzy sets, and lower and upper approximations of rough sets into k-means or c-means algorithm. While the membership of fuzzy sets enables efficient handling of overlapping partitions, the rough sets deal with uncertainty, vagueness, and incompleteness in cluster definition. Due to integration of both probabilistic and possibilistic memberships, the RFPCM avoids the problems of noise sensitivity of fuzzy c-means [22] and the coincident clusters of possibilistic c-means [23]. Also, the concept of crisp lower bound and fuzzy boundary of a cluster, introduced in rough-fuzzy-possibilistic c-means, enables efficient selection of cluster prototypes. The performance of the proposed method, along with a comparison with other related approaches, is demonstrated both qualitatively and quantitatively on a set of real life document images.

The structure of the rest of this paper is as follows: Section 2 describes the proposed text-graphics segmentation method based on rough-fuzzy-possibilistic c-means and M-band wavelet packet. An unsupervised feature selection algorithm is introduced to select relevant and non-redundant features for segmentation. Section 3 presents the experimental results on several document images and a comparison among different methods. Finally, concluding remarks are given in Section 4.

Section snippets

Proposed text-graphics segmentation method

The proposed text-graphics segmentation algorithm based on M-band wavelet packet and rough-fuzzy-possibilistic c-means is illustrated in Fig. 1. The algorithm proceeds as follows:

1.
The input image is decomposed using M-band wavelet packet into m number of subbands based on energy estimation of each subband with respect to two threshold values.
2.
These outputs are subjected to nonlinear operation followed by smoothing operation.
3.
The unsupervised feature selection algorithm is applied to the feature

Experimental results

The proposed text-graphics segmentation method judiciously integrates the merits of rough-fuzzy-possibilistic c-means (RFPCM) [21] and M-band wavelet packet analysis. In this section, the performance of the proposed method is extensively compared with that of different clustering algorithms and several feature extraction techniques. The clustering algorithms compared are k-means or hard c-means (HCM) [33], fuzzy c-means (FCM) [22], [34], possibilistic c-means (PCM) [23], fuzzy-possibilistic c

Conclusion

In this paper, a new methodology is presented, integrating judiciously the merits of rough-fuzzy-possibilistic c-means algorithm and multiresolution image analysis, for segmenting the text part from the graphics region based on textural cues. The rough-fuzzy-possibilistic c-means combines c-means algorithm, rough sets, and probabilistic and possibilistic memberships of fuzzy sets. This formulation is geared towards maximizing the utility of both rough sets and fuzzy sets with respect to

Acknowledgements

This work is partially supported by the Indian National Science Academy, New Delhi, India (grant no. SP/YSP/68/2012). The authors would like to thank anonymous referees and Prof. M.K. Kundu of Indian Statistical Institute, Kolkata for providing helpful comments and valuable criticisms on the original version of the manuscript which have greatly improved the presentation of paper.

References (35)

P. Chauvet et al.
System for an intelligent office document analysis, recognition and description
Signal Process.
(1993)
F.M. Wahl et al.
Block segmentation and text extraction in mixed text/image document
Comput. Graphics Image Process.
(1982)
Y.-L. Chen et al.
A multi-plane approach for text segmentation of complex document images
Pattern Recognit.
(2009)
S. Deng et al.
Document segmentation using polynomial spline wavelets
Pattern Recognit.
(2001)
N. Saito et al.
Discriminant feature extraction using empirical probability density estimation and a local basis library
Pattern Recogn.
(2002)
K. Huang et al.
Information-theoretic wavelet packet subband selection for texture classification
Signal Process.
(2006)
K. Etemad et al.
Multiscale segmentation of unstructured document pages using soft decision integration
IEEE Trans. Pattern Anal. Mach. Intell.
(1997)
S.N. Srihari
Document image understanding
G. Nagy et al.
A prototype document image analysis for technical journals
Computer
(1992)
M. Krishnamoorthy et al.
Syntactic segmentation and labeling of digitized pages from technical journals
IEEE Trans. Pattern Anal. Mach. Intell.
(1993)

L. O’Gorman

The document spectrum for page layout analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(1993)

S. Nicolas et al.

Document Image Segmentation Using a 2D Conditional Random Field Model

T. Lelore et al.

Document Image Binarisation Using Markov Field Model

C. Junga et al.

A new approach for text segmentation using a stroke filter

Signal Process.

(2008)

B. Su et al.

Robust document image binarization technique for degraded document images

IEEE Trans. Image Process.

(2013)

J. Li et al.

Context-based multiscale classification of document images using wavelet coefficient distributions

IEEE Trans. Image Process.

(2000)

M. Acharyya et al.

Document image segmentation using wavelet scale-space features

IEEE Trans. Circuits Syst. Video Technol.

(2002)

Cited by (29)

Rough segmentation of coherent local intensity for bias induced 3-D MR brain images
2020, Pattern Recognition
Segmentation of brain MR volumes into different meaningful tissue classes is an essential prerequisite for many clinical analyses. However, intensity inhomogeneity or bias field, present in MR volumes, considerably degrades the quality of segmentation. In this regard, the paper presents a new segmentation algorithm, termed as CoLoRS (Coherent Local Intensity Rough Segmentation), for brain MR volumes corrupted with bias field artifact. It judiciously integrates the merits of coherent local intensity clustering and the theory of rough sets for simultaneous segmentation and bias field correction of brain MR volumes. The proposed algorithm partitions the entire image space into a number of small overlapping neighborhood regions. The bias, in each neighborhood region, is assumed to be constant. For each individual region, an objective function is defined for coherent local intensity rough segmentation. The voxels near the center point have similar influences on local objective function. In addition, the smaller distance between center and neighboring voxels yields more contribution on the voxel of interest. The proposed algorithm uses the dual-region concept to represent the neighborhood structure more efficiently. It makes possible of separate modeling of the voxels within neighborhood, according to their locations. Each region is considered to have several tissue classes, where each tissue class consists of a core region and an overlapping region. The segmentation in fuzzy approximation spaces provides an effective mean for brain MR volume analysis, as it handles overlapping partitions and addresses vagueness in tissue class definition. The effectiveness of the proposed algorithm, along with a comparison with existing approaches, is demonstrated on several publicly available brain MR data.
Multigranulation rough-fuzzy clustering based on shadowed sets
2020, Information Sciences
Citation Excerpt :
In this case, (b) and (c) will take over the effect of (a). The threshold, that determines the approximation regions of each cluster, is often selected depending on subjective tuning in the available researches [16,26]. Maji et al. [15] and Sarkar et al. [28] chose this value as the average value and the median of the difference between the highest and second highest fuzzy memberships of all the patterns, respectively.
In this study, a new technique of rough-fuzzy clustering based on multigranulation approximation regions is developed to tackle the uncertainty associated with the fuzzifier parameter m. According to shadowed set theory, the multigranulation approximation regions for each cluster can be formed based on fuzzy membership degrees under the multiple values of fuzzifier parameter with a partially ordered relation. The uncertainty generated by the fuzzifier parameter m can be captured and interpreted through the variations in approximation regions among different levels of granularity, rather than at a single level of granularity under a specific fuzzifier value. An ensemble strategy for updating prototypes is then presented based on the constructed multigranulation approximation regions, in which the prototype calculations that may be spoiled due to the uncertainty caused by a single fuzzifier value can be modified. Finally, a multilevel degranulation mechanism is introduced to evaluate the validity of clustering methods. By integrating the notions of shadowed sets and multigranulation into rough-fuzzy clustering approaches, the overall topology of data can be captured well and the uncertain information implicated in data can be effectively addressed, including the uncertainty generated by fuzzification coefficient, the vagueness arising in boundary regions and overlapping partitions. The essence of the proposed method is illustrated by comparative experiments in terms of several validity indices.
Fuzzy granulation and constraint neighbourhood granulation structure for object classification in unevenly illuminated images
2019, Applied Soft Computing Journal
Classification of objects and background in a complex scene is a challenging early vision problem. Specifically, the problem is compounded under poor illumination conditions. In this paper, the problem of object and background classification has been addressed under unevenly illuminated conditions. The challenge is to extract the actual boundary under poorly illuminated portion. This has been addressed using the notion of rough sets and granular computing. In order to take care of differently illuminated portions over the image, adaptive window growing approach has been employed to partition the image into different windows and optimum threshold for classification over a given window has been determined by the four proposed granular computing based schemes. Over a particular window, illumination varies significantly posing a challenge for classification. In order to deal with these issues, we have proposed four schemes based on heterogeneous and non-homogeneous granulation. They are; (i) Heterogeneous Granulation based Window Growing (HGWG), (ii) Empirical Non-homogeneous Granulation based Window Growing (ENHWG), (iii) Fuzzy Gradient Non-homogeneous based Window Growing (FNHWG), (iv) Fuzzy Gradient Non-homogeneous Constrained Neighbourhood based Window Growing (FNHCNGWG). The proposed schemes have been tested with images from Berkeley image database, specifically with unevenly illuminated images having single and multiple objects. The performance of the proposed schemes has been evaluated based on four metrics. The performance of the FNHWG and FNHCNGWG schemes has been compared with Otsu, K-means, FCM, PCM, Pal’s method, HGWG, ENHWG and Fuzzy Non-homogeneous Neighbourhood based Window growing (FNHNGWG) schemes and found to be superior to the existing ones.
Rough possibilistic C-means clustering based on multigranulation approximation regions and shadowed sets
2018, Knowledge-Based Systems
The management of uncertain information in a data set is crucial for clustering models. In this study, we present a rough possibilistic C-means clustering approach based on multigranulation approximation regions and shadowed sets, which can handle the uncertainties implicated in data and generated by the model parameters simultaneously. In particular, all patterns are first partitioned into three approximation regions with respect to a fixed cluster according to their possibilistic membership degrees based on shadowed set theory, which can help capture the natural topology of the data, especially when dealing with outliers and noisy data. The multigranulation approximation regions of each cluster can then be formed under a series of fuzzifier values, where the uncertainty caused by a specific fuzzifier value can be detected based on variations in the approximation regions with different levels of granularity. We also introduce a framework for updating prototypes based on ensemble strategies to attenuate the distortions due to iteration during clustering procedures. Finally, an adaptive mechanism is developed for dynamically adjusting scale parameters based on the notion of the maximal compatible regions of clusters. By integrating various granular computing techniques, i.e., rough sets, fuzzy sets, shadowed sets, and the notion of multigranulation, the uncertainties implicated in data and produced by model parameters can be adequately addressed, and the possibilistic membership values involved make the method sufficiently robust to deal with noisy environments. The improved performance of the proposed approach was demonstrated in experiments based on the comparisons with other available fuzzy and possibilistic clustering methods.
Accurate segmentation of complex document image using digital shearlet transform with neutrosophic set as uncertainty handling tool
2017, Applied Soft Computing Journal
Citation Excerpt :
We compared the performance of the proposed method with four published methods. They are Acharyya [25], Kumar [26], Maji [28] and Gomez [10]. The qualitative results of the text region segmentation by five different methods (Acharyya [25], Kumar [26], Maji [28], Gomez [10]and the proposed) are shown in Fig. 4.
In any image segmentation problem, there exist uncertainties. These uncertainties occur from gray level and spatial ambiguities in an image. As a result, accurate segmentation of text regions from non-text regions (graphics/images) in mixed and complex documents is a fairly difficult problem. In this paper, we propose a novel text region segmentation method based on digital shearlet transform (DST). The method is capable of handling the uncertainties arising in the segmentation process. To capture the anisotropic features of the text regions, the proposed method uses the DST coefficients as input features to a segmentation process block. This block is designed using the neutrosophic set (NS) for management of the uncertainty in the process. The proposed method is experimentally verified extensively and the performance is compared with that of some state-of-the-art techniques both quantitatively and qualitatively using benchmark dataset.
Complex layout analysis based on contour classification and morphological operations
2017, Engineering Applications of Artificial Intelligence
Citation Excerpt :
The recently published works mainly focus on hybrid and local analysis algorithms. The later (Asi et al., 2014; Mehri et al., 2015; Maji and Roy, 2015; Chen et al., 2015) can handle very complex layouts with multiple backgrounds and overlapping regions. They need however to be trained and tested every time a new dataset is considered, since different features may have to be taken into account.
In this paper, a hybrid technique for complex layout analysis is presented. Morphological operations are applied to both the foreground and the background, in order to connect neighboring regions and detect separator lines and columns respectively. Contour tracing is used for the extraction of shape and size information and classification of the connected components. Evaluated on the RDCL-2015 dataset, the method achieved state-of-art results in less than three seconds per page.

View all citing articles on Scopus

View full text

Rough-fuzzy clustering and multiresolution image analysis for text-graphics segmentation

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Proposed text-graphics segmentation method

Experimental results

Conclusion

Acknowledgements

Signal Process.

Comput. Graphics Image Process.

Pattern Recognit.

Pattern Recognit.

Pattern Recogn.

Signal Process.

Multiscale segmentation of unstructured document pages using soft decision integration

IEEE Trans. Pattern Anal. Mach. Intell.

Document image understanding

A prototype document image analysis for technical journals

Computer

Syntactic segmentation and labeling of digitized pages from technical journals

IEEE Trans. Pattern Anal. Mach. Intell.

The document spectrum for page layout analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Document Image Segmentation Using a 2D Conditional Random Field Model

Document Image Binarisation Using Markov Field Model

A new approach for text segmentation using a stroke filter

Signal Process.

Robust document image binarization technique for degraded document images

IEEE Trans. Image Process.

Context-based multiscale classification of document images using wavelet coefficient distributions

IEEE Trans. Image Process.

Document image segmentation using wavelet scale-space features

IEEE Trans. Circuits Syst. Video Technol.