Elsevier

Pattern Recognition

Volume 47, Issue 3, March 2014, Pages 1330-1348
Pattern Recognition

A generalized multiclass histogram thresholding approach based on mixture modelling

https://doi.org/10.1016/j.patcog.2013.09.004Get rights and content

Highlights

  • Generalizing thresholding to multi-modal class segmentation.

  • Classes are modeled using mixtures of Generalized Gaussian distributions.

  • Formulation of thresholding based on maximum likelihood estimation.

  • Application to image foreground segmentation.

Abstract

This paper presents a new approach to multi-class thresholding-based segmentation. It considerably improves existing thresholding methods by efficiently modeling non-Gaussian and multi-modal class-conditional distributions using mixtures of generalized Gaussian distributions (MoGG). The proposed approach seamlessly: (1) extends the standard Otsu's method to arbitrary numbers of thresholds and (2) extends the Kittler and Illingworth minimum error thresholding to non-Gaussian and multi-modal class-conditional data. MoGGs enable efficient representation of heavy-tailed data and multi-modal histograms with flat or sharply shaped peaks. Experiments on synthetic data and real-world image segmentation show the performance of the proposed approach with comparison to recent state-of-the-art techniques.

Introduction

Thresholding-based image segmentation is a well-known technique that is used in a broad range of applications, such as change detection [20], object recognition [3], [34] and document image analysis [26], to name a few. Image thresholding aims at building a partition of an image into K classes, C1,,CK, which are separated by K1 thresholds T1,,TK1. In case of K=2, the image is segmented into foreground and background regions. In case of K>2, the image is segmented into K distinct regions. In most of existing thresholding methods, the parameter K is generally given and it corresponds to the number of histogram modes [27]. Comparative studies about existing thresholding techniques applied to image segmentation can be found in [10], [24], [27], [32].

Among the most popular methods for image thresholding are the standard Otsu's method [22] and Kittler and Illingworth's method [14]. While the former uses inter-class separability to calculate optimal thresholds between classes, the latter is based on the minimization of Bayes classification error, where each class is modeled by a Gaussian distribution. Both methods, however, assume a uni-modal shape for classes and use sample mean and standard deviation (i.e., the parameters of a Gaussian) to approximate their distributions. In [12], [32], the authors established the relationship between the two methods, where these parameters can be obtained in either methods using maximum likelihood estimation of a Gaussian model for each class. Entropy and relative entropy can also be used to derive good thresholds for image segmentation when the distribution of classes is Gaussian [5], [6], [25]. For example, Jiulun and Winxin [12] gave a relative-entropy interpretation for the minimum error thresholding (MET) [14], [19]. In that work, the Kullback–Leibler divergence [15] is used to measure the discrepancy between histograms of a source image and a mixture of two Gaussians. Recently, Xue and Titterington [30] proposed a thresholding method where class data are modeled by Laplacian distributions. They showed that the obtained thresholds offer better separation of classes when their distributions are skewed, heavy-tailed or contaminated by outliers. Indeed, the location and dispersion parameters of the Laplacian distribution are the median and the absolute deviation from the median, which are more robust to outliers compared to the sample mean and standard deviation, respectively [11].

Previous methods for image thresholding were basically devised to separate classes that are unimodal [27]. Therefore, they are not adapted to multi-modal class segmentation. For instance, in many segmentation problems (e.g., medical images, and remote sensing), one might want to separate the image foreground from a background region, each of which may have a multi-modal distribution. Another limitation for the standard methods [14] and [22], and as pointed out in [30], lies in the assumption that class data are Gaussian. In several image examples, one can find histogram modes that are skewed, sharply peaked or heavy tailed, making the assumption of Gaussian-distributed classes not realistic. Recently, researchers have used other distribution types to provide better image thresholding methods by modeling histogram classes using, for instance, Poisson [23], generalized Gaussian [2], [7], [8], skew-normal [31] and Rayleigh [29] distributions. However, these approaches are also built on the assumption that all classes are unimodal. Worth mentioning is the parallel trend of using mixture methods for segmentation (ex. [1], [21], [35], [36]), where data are clustered to classes determined by the components of a learned mixture model. For such works, the number of classes (which correspond to the number of mixture components) can be estimated using information-theoretic criteria such as AIC, BIC, MML, etc. [18]. This paper deals with a different problem which consists of finding thresholds between classes with distributions that can be constituted of arbitrary numbers of (non-Gaussian) histogram modes. Thus, contrary to [1], [21], the number of classes K will not necessarily correspond to the number histogram modes.

In this paper, we propose a new thresholding approach that performs segmentation for multi-modal classes with arbitrarily shaped modes. We generalize the aforementioned state-of-art techniques, based on using single probability density functions (pdf's), to mixtures of generalized Gaussian distributions (MoGG's). The Generalized Gaussian Distributions (GGD) is a generalization of the Laplacian and the normal distributions in that it has an additional degree of freedom that controls its kurtosis. Therefore, histogram modes, ranging from sharply peaked to flat ones, can be accurately represented using this model. Furthermore, skewed and multi-modal classes are accurately represented using mixtures of GGDs. We propose an objective function that finds optimal thresholds for multi-modal classes of data. It also extends easily to arbitrary numbers of classes (K>2) with reasonable computational time. Experiments on synthetic data, as well as real-world image segmentation, show the performance of the proposed approach.

This paper is organized as follows: Section 2 presents state-of-the-art theory for thresholding techniques. In Section 3 we outline our proposed approach for image thresholding. Experimental results are given in Section 4. We end the paper with a conclusion and some future work perspectives.

Section snippets

General formulation of the Otsu's method (case K=2)

Let X={x1,x2,,xN} be the gray levels of the pixels of an image I of size N=H×W; H and W being the height and the width of the image. Let t=(t1,t2,,tK1) be a set of thresholds that partitions an image into K classes. First we consider the simple case of K=2. The most general case of K>2 will be elaborated later in this paper. In the case of K=2, one threshold t yields two classes C1(t)={x:0xt} and C2(t)={x:t+1xT}, where T is the maximum gray level. Finally, we denote by h(x) the histogram

Multi-modal class thresholding

We propose to extend the general model formulated by Eqs. (1), (2), (3) to represent multi-modal class-conditional distributions using finite mixture models (FMM). Finite mixtures are a flexible and powerful probabilistic tool for modeling univariate and multivariate data [9], [18]. They allow for modeling randomly generated data from multiple sources in an unsupervised fashion. Recently, Allili et al. [1] proposed to use finite mixtures of generalized Gaussian distributions (MoGG) to model

Experimental results

We conducted several experiments to measure the performance of the proposed approach by comparing it to recent state-of-the-art thresholding methods. For this purpose, we used synthetic histograms as well as real images from known datasets [27]. Quantitative results are presented showing how well the proposed model finds optimal thresholds in terms of segmentation accuracy.

We objectively measure thresholding performance by using Misclassification Error (ME) criterion. For foreground/background

Computational time complexity

The computational complexity of the standard Otsu's and its median extension approaches grows exponentially with the number of thresholds and gray levels. The two methods can compute the optimal threshold in O(NL) operations, where N is the number of gray levels, L=K1 is the number of thresholds and K is the number of mixture components. This complexity is due to the minimization of criterion functions JO(t) and JM(t) which have L-dimensions (i.e., a function of L variables). This property

Remarks and discussion

In this section, we present some remarks and caveats about the approaches studied in this paper, and thresholding-based segmentation in general. These revolve around the multi-modality of criterion function used for threshold determination. We observed in some cases that the best threshold is not given by the global minimum but by a local minimum. This limitation was already observed for the standard Otsu's method in [13]. It has been demonstrated that the objective function may not only be

Conclusion

A new thresholding approach, based on the Mixture of Generalized Gaussian model (MoGG method), is presented in this paper. The approach generalizes previous methods to multi-thresholding and multi-modal classes. It has been successfully tested on segmentation of real images (NDT-images [27]) and randomly generated data sets. Experiments have shown the performance of the proposed approach and showed that it can achieve more optimal thresholds than the standard Otsu's method [22], the median

Conflict of Interest statement

None declared.

Aïssa Boulmerka received the B Eng and MSc degrees in computer science from École nationale Supérieure en Informatique (Algeria) in 2004 and 2009, respectively. Since 2010, he has been pursuing PhD studies at École nationale Supérieure en Informatique (Algeria) and Université du Québec in Outaouais (Canada ) under the supervision of Professor Samy Ait-Aoudia and Professor Mohand Saïd Allili. His primary research interests include statistical models and application to image segmentation,

References (36)

  • P.K. Sahoo et al.

    Image thresholding using two-dimensional Tsallis–Havrda–Charvát entropy

    Pattern Recognition Letters

    (2006)
  • S. Wang et al.

    A novel image thresholding method based on Parzen window estimate

    Pattern Recognition

    (2008)
  • J.-H. Xue et al.

    Median-based image thresholding

    Image and Vision Computing

    (2011)
  • J.-H. Xue et al.

    Ridler and Calvard's, Kittler and Illingworth's and Otsu's methods for image thresholding

    Pattern Recognition Letters

    (2012)
  • P.-Y. Yin et al.

    A fast iterative scheme for multilevel thresholding methods

    Signal Processing

    (1997)
  • M.S. Allili et al.

    Finite General Gaussian mixture modelling and application to image and video foreground segmentation

    Journal of Electronic Imaging

    (2008)
  • Y. Bazi et al.

    Image thresholding based on the EM algorithm and the generalized Gaussian distribution

    Pattern Recognition

    (2008)
  • B. Bhanu et al.

    Adaptive Integrated Image Segmentation and Object Recognition

    IEEE Transactions on Systems, Man, and Cybernetics—Part C

    (2000)
  • Cited by (14)

    • Pixel classification based color image segmentation using quaternion exponent moments

      2016, Neural Networks
      Citation Excerpt :

      Some works published in this field cover the peaks detection on the histogram curve based upon homogeneity criteria, recursive thresholding techniques based upon discriminant analysis, maximum correlation criterion for multilevel thresholding, entropy-based, using fuzzy sets, among several others (Chauhan et al., 2014; Weinland et al., 2011). Boulmerka, Allili, and Ait-Aoudia (2014) presented a new approach to multi-class thresholding-based segmentation. It considerably improves existing thresholding methods by efficiently modeling non-Gaussian and multi-modal class-conditional distributions using mixtures of generalized Gaussian distributions (MoGG).

    • Adaptive thresholding with fusion using a RGBD sensor for red sweet-pepper detection

      2016, Biosystems Engineering
      Citation Excerpt :

      General computer vision literature teaches that predefined global thresholding often fails in most scenarios (Haralick & Shapiro, 1985; Nalwa, 1993). To overcome the problem, several vision applications in agriculture apply adaptive thresholding algorithms, where the threshold adaptively changes to the new illumination conditions (Boulmerka, Said-Allili, & Ait-Aoudia, 2014; Hannan et al., 2007). Interestingly, to our knowledge, there has been no attempt to use spatially varying local thresholding in the context of agriculture despite its presence in the computer vision literature (Batenburg & Sijbers, 2009; Pai, Chang, & Ruan, 2010; Zhang, Zhang, Song, & Zhou, 2010) and despite the need for it due to the highly variable illumination conditions that occur in the field.

    • A non-parametric Bayesian model for bounded data

      2015, Pattern Recognition
      Citation Excerpt :

      Hence, SMM provides a more powerful and flexible approach for probabilistic data clustering compared to the GMM. Another way to model data with different shapes is to use the mixture model with generalized Gaussian distribution [9,10]. The main advantage of the generalized Gaussian distribution is that, it has the flexibility to fit the shape of data better than a Gaussian distribution.

    • Safe Landing Zones Detection for UAVs Using Deep Regression

      2022, Proceedings - 2022 19th Conference on Robots and Vision, CRV 2022
    • AI-AR for Bridge Inspection by Drone

      2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus

    Aïssa Boulmerka received the B Eng and MSc degrees in computer science from École nationale Supérieure en Informatique (Algeria) in 2004 and 2009, respectively. Since 2010, he has been pursuing PhD studies at École nationale Supérieure en Informatique (Algeria) and Université du Québec in Outaouais (Canada ) under the supervision of Professor Samy Ait-Aoudia and Professor Mohand Saïd Allili. His primary research interests include statistical models and application to image segmentation, computer vision and pattern recognition.

    Mohand Saïd Allili received the M.Sc. and Ph.D. degrees in computer science from the University of Sherbrooke, Sherbrooke, QC, Canada, in 2004 and 2008, respectively. Since June 2008, he has been an Assistant Professor of computer science with the Department of Computer Science and Engineering, Université du Québec en Outaouais, Canada. His main research interests include computer vision and graphics, image processing, pattern recognition, and machine learning. Dr. Allili was a recipient of the Best Ph.D. Thesis Award in engineering and natural sciences from the University of Sherbrooke for 2008 and the Best Student Paper and Best Vision Paper awards for two of his papers at the Canadian Conference on Computer and Robot Vision 2007 and 2010, respectively.

    Samy Ait-Aoudia received a DEA “Diplôme d'Etudes Approfondies” in image processing from Saint-Etienne University, France, in 1990. He holds a Ph.D. degree in computer science from the Ecole des Mines, Saint-Etienne, France, in 1994. He is currently a Professor of computer science at the National High School in Computer Science at Algiers/Algeria, where he is involved in teaching BSc and MSc levels in computer science and software engineering. His areas of research include image processing, CAD/CAM and constraints management in solid modeling.

    View full text