Keywords

1 Introduction

As one of the most important and difficult tasks in image analysis and computer vision, image segmentation is defined as the partitioning of an image into non-overlapping, consistent regions. Although various image segmentation algorithms have been proposed, automated and accurate segmentation algorithm is still a very challenging research topic due to overlapping intensities, low contrast, noise perturbation, and asymmetric form of the intensity distribution [1].

During the last decades, a number of model-based techniques [2] have been proposed, in which the standard Gaussian mixture models (GMM) [3] is a well-known method that has been widely used due to its simplicity and easiness of implementation. The main advantage of the standard GMM is that the small number of parameters can be efficiently estimated by adopting the expectation maximization (EM) algorithm [4]. However, GMM assumes that each pixel in an image is independent of its neighbor. Therefore, the performance of GMM is sensitive to outliers. To improve the robustness over noise, mixture models with (hidden) Markov random fields ((H)MRF) have been frequently employed for pixel label [5, 6], where the prior distribution varies for every pixel corresponding to each label and depends on the neighboring pixels. Another group of mixture models based on MRF are proposed in [7–10] where a MRF models the joint distribution of the priors of each pixel label. Although the effect of noise on the segmentation result is reduced, most (H)MRF based algorithms lack enough robustness with respect to noise. Moreover, the corresponding computational cost is quite high. On the other hand, only using one distribution for each component in the mixture model is not satisfactory enough for many practical applications. Therefore, the mixture of mixture model and the asymmetric mixture model have been widely studied recently [11–14]. However, without introducing any spatial information, the mixture of mixture model [13] and the asymmetric mixture model [1, 11] are still sensitive to the outliers/noise even though these models are more flexible for data distribution.

Motivated by the aforementioned observations, in this paper, we propose a novel spatially constrained asymmetric GMM (SCAGMM) for image segmentation. We firstly modify the asymmetric distribution proposed in [11] to make the model fit different shapes of observed data. Then our asymmetric model can be constructed based on the posterior probabilities and prior probabilities of within-cluster and between-cluster. Similar with the algorithm in [8], to further define the similarity between neighboring priors for within-cluster and between-cluster, we introduce two pseudo-likelihood quantities which respectively couple neighboring priors of within-cluster and between-cluster based on the Kullback-Leibler (KL) divergence. To estimate the involved parameters in the proposed algorithm, we derive an EM algorithm to iteratively maximize the approximation of the lower bound of the data log-likelihood. It is worth mentioning that the proposed prior distributions can be treated as the within-cluster and between-cluster spatial constraint, which are constructed based on the posterior probabilities and prior probabilities of the within-cluster and between-cluster, respectively. They play a role as image filters for smoothing and restoring images corrupted by noise. Therefore, the proposed scheme can simply and efficiently incorporate spatial constraints in an EM framework for image segmentation. The proposed algorithm has been compared to other state-of-the-art segmentation algorithms in both simulated and real images to demonstrate the superior performance of the proposed algorithm.

2 Background

Notations used throughout the paper are denoted as follows. Let \(X=\{x_i,i=1,2,...,N\}\) denote the target image, where \(x_i\) with dimension D is an observation at the ith pixel of the image. Let the neighborhood of the ith pixel be presented by \(\partial _i\). Labels are denoted by \((\varOmega _1,\varOmega _2,...,\varOmega _K)\). To segment an image consisting of N pixels into K labels, the finite mixture model assumes that each observation \(x_i\) with dimension D is considered independent of the label \(\varOmega _k,k=1,...,K\). The corresponding density function is given by

$$\begin{aligned} p(x_i|\varPi ,\varTheta )=\sum _{k=1}^K\pi _{ik}p(x_i|\varOmega _k), \end{aligned}$$
(1)

where \(\varPi =\{\pi _{ik}\},i=\{1,2,...,N\},k=\{1,2,...,K\}\) is the set of prior distributions modeling the probability that pixel \(x_i\) is in label \(\varOmega _k\), which satisfies the constraints \(0\le \pi _{ik}\le 1\) and \(\sum _{k=1}^K\pi _{ik}=1\).

Each distribution \(p(x_i|\varOmega _k)\) is a component of the mixture model which can be any kind of distribution. In GMM [3], \(p(x_i|\varOmega _k)\) is the Gaussian distribution \(\varPhi (x_i|\mu _k,\varSigma _k)\) which can be written in the form

$$\begin{aligned} \varPhi (x_i|\mu _k,\varSigma _k)=\frac{1}{(2\pi )^{D/2}|\varSigma _k|^{1/2}}exp\left\{ -\frac{1}{2}(x_i-\mu _k)^T\varSigma _k^{-1}(x_i-\mu _k)\right\} , \end{aligned}$$
(2)

where \(\mu _k\) is the mean vector with D dimension, \(\varSigma _k\) is the covariance matrix with \(D\times D\) dimension, and \(|\varSigma _k|\) is the determinant of \(\varSigma _k\).

We assume that each pixel i belongs to a single class which is indexed by the hidden random variable \(z_i\). The variable \(z_i\) takes values from a discrete set of labels 1, ..., K. Then, the corresponding generative model for GMM can be shown in Fig. 1(a).

In asymmetric mixture models [1, 11], a new asymmetric distribution \(p(x_i|\varOmega _k)\) is defined to fit different shapes of observed data such as non-Gaussian and non-symmetric:

$$\begin{aligned} p(x_i|\varOmega _k)=\sum _{l=1}^L\psi _{kl}p(x_i|\varOmega _{kl}), \end{aligned}$$
(3)

where L is the number of multivariate distribution \(p(x_i|\varOmega _{kl})\) that is used to model the label \(\varOmega _k\) and \(\psi _{kl}\) is called the weighting factor which satisfies the constraints \(0\le \psi _{kl}\le 1\) and \(\sum _{l=1}^L\psi _{kl}=1\). The motivation to define the distribution in Eq. 3 is based on the fact that non-Gaussian and non-symmetric data can be approximated by multiple multivariate distributions \(p(x_i|\varOmega _{kl})\). It is worth mentioning that the distribution \(p(x_i|\varOmega _{kl})\) can be one of the widely used distributions, such as Gaussian distribution, Students t-distribution and generalized Gaussian distribution. The probabilistic graphical model for asymmetric mixture models is shown in Fig. 1(b).

To improve the robustness to the noise for GMMs, MRF distribution is applied to incorporate the spatial correlation amongst prior values

$$\begin{aligned} p(\varPi )=\frac{1}{Z}exp\left\{ -\frac{1}{T}U(\varPi )\right\} , \end{aligned}$$
(4)

where Z is a normalizing constant, T is a temperature constant, and \(U(\varPi )\) is the smoothing prior. The posterior probability density function given by Bayes’rules can be written as

$$\begin{aligned} p(\varPi ,\varTheta |X)\propto p(X|\varPi ,\varTheta )p(\varPi ). \end{aligned}$$
(5)

The involved parameters are usually estimated through maximizing the log-likelihood function of Eq. 5 via the EM algorithm to get the final segmentation labels. These MRF-GMM models can be graphically shown in Fig. 1(c).

Most MRF-based mixture models have been successfully applied to image segmentation by adopting different energy \(U(\varPi )\). The main motivation for using this model as opposed to the traditional MRF model on pixel labels is its flexibility with respect to the initial conditions, in which the spatial constraints are directly enforced over the neighboring priors to obtain a smoother energy function and make the algorithm less dependent on the initializations.

Fig. 1.
figure 1

The probabilistic graphical models for (a) GMM, (b) asymmetric mixture models, (c) MRF-GMM, and (d) the proposed SCAGMM model.

3 Proposed Algorithm

In this paper, the modified density function \(f(x_i|\varPi ,\varPsi ,\varTheta )\) for each pixel \(x_i\) is defined as

$$\begin{aligned} f(x_i|\varPi ,\varPsi ,\varTheta )=\sum _{k=1}^K\pi _{ik}\left( \sum _{l=1}^{L}\psi _{ikl}\varPhi (x_i|\mu _{kl},\varSigma _{kl})\right) \end{aligned}$$
(6)

where K is the number of clusters/labels, L is the number of multivariate Gaussian distribution that is used to model the label \(\varOmega _k\), \(\varPi =\{\pi _{ik}\}\) is the set of between-cluster prior distributions modeling the probability that pixel \(x_i\) is in label \(\varOmega _k\), \(\varPsi =\{\psi _{ikl}\}\) is the set of within-cluster prior distributions modeling the probability that pixel \(x_i\) is in the lth component of kth cluster. \(\varTheta \) represents the model parameters. \(\varPhi \) is the Gaussian distribution. Note that the observation \(x_i\) is modeled as statistically independent, the joint conditional density of observed data set can be modeled as

$$\begin{aligned} p(X|\varPi ,\varPsi ,\varTheta )=\prod _{i=1}^Nf(x_i|\varPi ,\varPsi ,\varTheta ) \end{aligned}$$
(7)

Based on the Bayes’s rules, the posterior probability density function can be written as

$$\begin{aligned} p(\varPi ,\varPsi ,\varTheta |X)\propto {p(X|\varPi ,\varPsi ,\varTheta )p(\varPi )p(\varPsi )} \end{aligned}$$
(8)

Therefore, the probabilistic graphical model for the proposed algorithm can be described as Fig. 1(d).

As in [15], we employ the Besag approximation for modeling the joint density over pixel priors

$$\begin{aligned} p(\varPi )\approx {\prod _{i=1}^Np(\pi _i|\pi _{\partial _i})} \quad p(\varPsi )\approx {\prod _{i=1}^Np(\psi _i|\psi _{\partial _i})} \end{aligned}$$
(9)

where \(\pi _{\partial _i}\) and \(\psi _{\partial _i}\) are respectively defined as mixture distributions over the between-cluster and within-cluster priors of neighboring pixels of pixel i, i.e.,

$$\begin{aligned} \pi _{\partial _i}=\sum \nolimits _{j\in {\partial _i},j\ne i}\lambda _{ij}\pi _j \quad \psi _{\partial _i}=\sum \nolimits _{j\in \partial _i,j\ne i}\lambda _{ij}\psi _j \end{aligned}$$
(10)

where \(\lambda _{ij}\) are positive weights for each pixel \(j(j\in {\partial _i},j\ne i)\) and satisfies \(\sum _j\lambda _{ij}=1\). It is worth mentioning that the evaluation of these mixtures correspond to convolution operations \(\pi _{\cdot k}*\lambda \) and \(\psi _{\cdot kl}*\lambda \) for each cluster k and Gaussian component l, where \(\lambda \) is a symmetric linear image filter with zero coefficient in its center and nonnegative coefficients elsewhere that sum to one [8].

For the conditional densities \(p(\pi _i|\pi _{\partial _i})\) and \(p(\psi _i|\psi _{\partial _i})\), we assume an approximate log-model in the form

$$\begin{aligned} \begin{aligned} \log p(\pi _i|\pi _{\partial _i})=-\alpha [D(\pi _i\Vert \pi _{\partial _i})+H(\pi _i)] \\ \log p(\psi _i|\psi _{\partial _i})=-\beta [D(\psi _i\Vert \psi _{\partial _i})+H(\psi _i)] \end{aligned} \end{aligned}$$
(11)

where \(D(\pi _i\Vert \pi _{\partial _i})\) is the KL divergence between \(\pi _i\) and \(\pi _{\partial _i}\), and \(D(\psi _i\Vert \psi _{\partial _i})\) is the KL divergence between \(\psi _i\) and \(\psi _{\partial _i}\), which are always nonnegative and become zero when \(\pi _i=\pi _{\partial _i}\) and \(\psi _i=\psi _{\partial _i}\). \(H(\pi _i)\) and \(H(\psi _i)\) are the entropy of the distribution \(\pi _i\) and \(\psi _i\), respectively. To facilitate optimization, we introduce an approximation that makes use of two auxiliaries set of distribution \(s_i^{(1)}\) and \(s_i^{(2)}\) as follows:

$$\begin{aligned} \begin{aligned} \log p(\pi _i|\pi _{\partial _i},s_i^{(1)})\approx -\alpha [D(s_i^{(1)}\Vert \pi _i)+D(s_i^{(1)}\Vert \pi _{\partial _i})+H(s_i^{(1)})] \\ \log p(\psi _i|\psi _{\partial _i},s_i^{(2)})\approx -\beta [D(s_i^{(2)}\Vert \psi _i)+D(s_i^{(2)}\Vert \psi _{\partial _i})+H(s_i^{(2)})] \end{aligned} \end{aligned}$$
(12)

Moreover, we introduce an additional penalty term involving posterior distributions in the form

$$\begin{aligned} \begin{aligned} -\frac{1}{4}[D(q_i^{(1)}\Vert p_i^{(1)})+D(q_i^{(1)}\Vert p_{\partial _i}^{(1)})+H(q_i^{(1)})] \\ -\frac{1}{4}[D(q_i^{(2)}\Vert p_i^{(2)})+D(q_i^{(2)}\Vert p_{\partial _i}^{(2)})+H(q_i^{(2)})] \end{aligned} \end{aligned}$$
(13)

where the coefficient 1 / 4 in the penalty term was chosen because it allows a tractable M-step. \(q_i^{(1)}\) and \(q_i^{(2)}\) are two arbitrary class distributions for pixel i, and two posterior class distributions \(p_i^{(1)}\) and \(p_i^{(2)}\) can be defined as follows:

$$\begin{aligned} p_{ik}^{(1)}\equiv \frac{\pi _{ik}\sum \nolimits _{l=1}^{L}\psi _{ikl}\varPhi (x_i|\mu _{kl},\varSigma _{kl})}{\sum \nolimits _{m=1}^K\left[ \pi _{mk}\sum \nolimits _{l=1}^{L}\psi _{iml}\varPhi (x_i|\mu _{ml},\varSigma _{ml})\right] } \end{aligned}$$
(14)
$$\begin{aligned} p_{ikl}^{(2)}\equiv \frac{\psi _{ikl}\varPhi (x_i|\mu _{kl},\varSigma _{kl})}{\sum \nolimits _{m=1}^L\psi _{ikm}\varPhi (x_i|\mu _{km},\varSigma _{km})} \end{aligned}$$
(15)

Putting all terms together, the penalized log-likelihood of the observed data yields (ignoring constants)

$$\begin{aligned} \begin{aligned}&L(\varPi ,\varPsi ,\varTheta |X) \\&=\sum _{i=1}^N\left\{ log\left( \sum _{k=1}^K\pi _{ik}\left( \sum _{l=1}^{L}\psi _{ikl}\varPhi (x_i|\mu _{kl},\varSigma _{kl})\right) \right) \right. \\&-\frac{1}{4}[D(q_i^{(1)}\Vert p_i^{(1)})+D(q_i^{(1)}\Vert p_{\partial _i}^{(1)})+H(q_i^{(1)})] \\&-\frac{1}{4}[D(q_i^{(2)}\Vert p_i^{(2)})+D(q_i^{(2)}\Vert p_{\partial _i}^{(2)})+H(q_i^{(2)})] \\&-\alpha [D(s_i^{(1)}\Vert \pi _i)+D(s_i^{(1)}\Vert \pi _{\partial _i})+H(s_i^{(1)})] \\&\left. -\beta [D(s_i^{(2)}\Vert \psi _i)+D(s_i^{(2)}\Vert \psi _{\partial _i})+H(s_i^{(2)})]\right\} \end{aligned} \end{aligned}$$
(16)

Then we utilize EM algorithm to maximize the energy \(L(\varPi ,\varPsi ,\varTheta |X)\) by coordinate ascent.

E-step: By fixing \(\varPi \), \(\varPsi \) and \(\varTheta \), we can optimize over \(s_i^{(1)}\), \(s_i^{(2)}\), \(q_i^{(1)}\) and \(q_i^{(2)}\) for each pixel i and get

$$\begin{aligned} \begin{aligned} s_{ik}^{(1)}\propto \pi _{ik}\pi _{\partial _ik}&\quad s_{ikl}^{(2)}\propto \psi _{ikl}\psi _{\partial _ikl} \\ q_{ik}^{(1)}\propto p_{ik}^{(1)}p_{\partial _ik}^{(1)}&\quad q_{ikl}^{(2)}\propto p_{ikl}^{(2)}p_{\partial _ikl}^{(2)} \end{aligned} \end{aligned}$$
(17)

M-step: By fixing \(s_i^{(1)}\), \(s_i^{(2)}\), \(q_i^{(1)}\) and \(q_i^{(2)}\), we can maximize \(L(\varPi ,\varPsi ,\varTheta |X)\) over \(\varPi \), \(\varPsi \) and \(\varTheta \) to get the following updating functions for each parameters:

$$\begin{aligned} \pi _{ik}=\frac{1}{1+4\alpha }\left( \frac{1}{4}(q_{ik}^{(1)}+q_{\partial _ik}^{(1)})+\alpha (s_{ik}^{(1)}+s_{\partial _ik}^{(1)})\right) \end{aligned}$$
(18)
$$\begin{aligned} \psi _{ikl}=\frac{1}{1+4\beta }\left( \frac{1}{4}(q_{ikl}^{(2)}+q_{\partial _ikl}^{(2)})+\beta (s_{ikl}^{(2)}+s_{\partial _ikl}^{(2)})\right) \end{aligned}$$
(19)
$$\begin{aligned} \mu _{kl}=\frac{\sum \nolimits _{i=1}^N(q_{ik}^{(1)}+q_{\partial _ik}^{(1)})(q_{ikl}^{(2)}+q_{\partial _ikl}^{(2)})x_i}{\sum \nolimits _{i=1}^N(q_{ik}^{(1)}+q_{\partial _ik}^{(1)})(q_{ikl}^{(2)}+q_{\partial _ikl}^{(2)})} \end{aligned}$$
(20)
$$\begin{aligned} \varSigma _{kl}=\frac{\sum \nolimits _{i=1}^N(q_{ik}^{(1)}+q_{\partial _ik}^{(1)})(q_{ikl}^{(2)}+q_{\partial _ikl}^{(2)})(x_i-\mu _{kl})(x_i-\mu _{kl})^T}{\sum \nolimits _{i=1}^N(q_{ik}^{(1)}+q_{\partial _ik}^{(1)})(q_{ikl}^{(2)}+q_{\partial _ikl}^{(2)})} \end{aligned}$$
(21)

4 Experimental Results

In this section, we applied the proposed algorithm SCAGMM to segment the testing images and compared the performances with four state-of-the-art algorithms, including two spatially constrained finite mixture models (a Spatially Constrained Generative Model and an EM algorithm (SCGM) [8], a Fast and Robust Spatially Constrained GMM (FRSCGMM) [10]) and two bounded finite mixture models (Bounded Asymmetric Mixture Model (BAMM) [11], Bounded Generalized Gaussian Mixture Model (BGGMM) [15]). All the algorithms are initialized with k-means algorithm. Unless otherwise specified, the parameters of the proposed algorithm are set as follows. We assume that the number of labels K are given to us for a particular testing image. The number of Gaussian distributions L is assigned a value of three (\(L=3\)). The prior parameters \(\alpha \) and \(\beta \) are both set as 0.25. All reported results of our algorithm were obtained by using the coefficients of the filter \(\lambda \) shown in Table 1.

Table 1. Coefficients of the filter used in the experiments
Fig. 2.
figure 2

The experimental results on synthetic image with Gaussian noise. (a) Noisy image, the segmentation results by applying (b) BAMM (MCR=0.0869), (c) BGGMM (MCR=0.0848), (d) SCGM (MCR=0.0058), (e) FRSCGMM (MCR=0.0094), and (f) the proposed SCAGMM algorithm (MCR=0.0040).

In the first experiment, a synthetic image corrupting with Gaussian noise (0 mean, 0.05 variance) as shown in Fig. 2(a) was used to compare the performance of the proposed algorithm with other comparison methods. The image contains four labels with luminance values [0, 1/3, 2/3, 1]. From Fig. 2(b) to (f), we presented the segmentation results obtained by applying BAMM, BGGMM, SCGM, FRSCGMM and SCAGMM, respectively. The marked rectangles show the enlarged partial views of the corresponding regions. From Fig. 2 we can find that the segmentation accuracies of BAMM and BGGMM are quit poor without considering any spatial information. FRSCGMM can get a better result, but the noise suppression ability is still limited. Both the SCGM and the proposed algorithm demonstrate better classification, and segment the image well. Nevertheless, the proposed method can get better segmentation with lower MCR especially for the pixels around the boundaries.

Table 2. MCR values (mean\(\pm \)standard deviation) of segmentations obtained by applying five algorithms on synthetic images with varying noise level.

Then six synthetic images (http://siddhantahuja.wordpress.com/2009/11/21/segmentation-dataset/) were used to test the details of the results obtained by different methods on varying noise level. The original images were corrupted with increasing level of Gaussian noise, salt and pepper noise and speckle noise. As shown in Table 2, the proposed method demonstrates a higher degree of robustness with respect to the given noise level.

Fig. 3.
figure 3

Illustration of three simulated T1-weighted brain MR images with 9% noise and the corresponding segmentation results obtained by each algorithm. In each subfigure, the images from left to right show the original image, segmentation results obtained by SCGM-EM, FRSCGMM, BAMM, BGGMM and the proposed algorithm, and the ground truth. The enlarged partial view of the corresponding marked region is shown below each figure.

In the next experiment, we applied all the algorithms on synthetic T1-weighted 1mm brain MR images selected from BrainWeb [16]. Three example (the 80-th axial, sagittal and coronal slice) brain MR images with 9% noise, together with their segmentation results and ground truths, are shown in Fig. 3. Both the SCGM-EM and FRSCGMM use the similar strategy with the proposed algorithm to construct the spatial information based on the posterior probability and prior probability. However, these two algorithms only use a symmetric smoothing filter over the whole image. The corresponding segmentation results are not satisfied enough for the pixels around boundaries. Moreover, both comparison algorithms may get over-smoothness segmentations and lose the details especially for the CSF tissue. Without considering any spatial information, both BAMM and BGGMM cannot well distinguish the noisy pixels. Comparing with ground truth and segmentations obtained with other algorithms, the proposed algorithm can visually get better results. To statistically show the significant of the proposed algorithm, we tested on 80 brain MR images, in which the level of noise ranges from 3 % to 9 %. The segmentation accuracy for each tissue, i.e. gray matter (GM) and white matter (WM), was measured in terms of average Dice coefficient (DC) [17] values, and the statistical results (means and standard deviations of DC values) are shown in Fig. 4. It should be noted that the value of DC ranges from 0 to 1 with a higher value representing a more accurate segmentation result. This comparison demonstrates that the proposed algorithm produces the most accuracy segmentation (with higher means of DC values) and has the best ability and robustness of denoising (with lower standard deviations of DC values).

Fig. 4.
figure 4

DC values of GM and WM segmentations obtained by applying five segmentation algorithms to simulated brain MR images with increasing levels of noise.

Fig. 5.
figure 5

Berkeley image segmentation results by SCAGMM.

Fig. 6.
figure 6

PR values of segmentation results on 15 Berkeley color images.

Moreover, we showed the segmentation results of real world color images in RGB color space which are selected from Berkeley’s image segmentation dataset [18]. A probabilistic evaluation can be achieved by the probabilistic rand (PR) index [19] which takes values between 0 and 1, with a higher value representing a more accurate segmentation result. Fig. 5 shows several segmentation results by applying the proposed SCAGMM algorithm, in which the number of labels is 2, 3, and 4, respectively, from the first column to the third column. It is evident that the proposed algorithm can get satisfactory results for all the testing images. Moreover, a set of color images was used to evaluate the performance of the proposed algorithm against the other four methods. Fig. 6 shows the PR values obtained with all methods for the given set of real world images. As evident from the results, the proposed algorithm outperforms other methods with a higher PR value.

5 Conclusion

To further improve the segmentation accuracy for GMM based algorithm, in this paper, we propose a spatially constrained asymmetric Gaussian mixture model for image segmentation. This algorithm has the flexibility to fit different shapes of observed data, and successfully overcomes the drawbacks of existing EM-type mixture models, including limited robustness to outliers, over-smoothness for segmentations, and limited segmentation accuracy for image details. Our results in synthetic images, brain MR images and real world color images show that the proposed algorithm is capable of producing more accurate segmentation results than several state-of-the-art algorithms. Our future work will focus on how to automatically detect the radius of spatial factor and the number of clusters/labels.