1 Introduction

Image segmentation is an image preprocessing technology, which can separate an image to several parts [1] and it is the important and fundamental procedure in image analysis. In result of image segmentation, internal characteristics of each part are similar and different parts are not. Existing image segmentation methods can be divided into two categories, one is supervised image segmentation and the other is unsupervised. In recent years, supervised image segmentation methods using deep learning such as Convolution Neural Network (CNN) [2] and Recurrent Neural Network (RNN) [3] have achieved remarkable results. At the same time, unsupervised image segmentation is also significant important in plenty of image processing applications such as disaster relief and aerial reconnaissance, which is too difficult to obtain accurate label samples for training sets beforehand.

ICM is simplified Pulse Coupled Neural Network (PCNN) model, which was introduced by Kinser [4] for the first time. This model is proposed for image processing, especially image segmentation, and it is computationally faster than PCNN model. Because of retaining the characteristics of pulse coupling, variable threshold and synchronous pulse distribution in PCNN model, ICM can effectively compensate for the discontinuity of data and help to preserve the regional information of the image. More importantly, ICM requires no training compared with tradition neural networks and novel deep learning such as BP, Hopfield and CNN. This model is based on study of the physiological and visual characteristics of mammals, many people are interested in it and many ICM-related approaches have been proposed, which have achieved competitive results. Most of existing ICM-related methods are used for gray image as external stimulus. In this paper, we also study ICM algorithm in order to propose a better object segmentation method for gray and color image segmentation.

Instead of considering the whole image, saliency detection can extract the most interest and important regions in a scene, according to people’s visual habits. More accurately, saliency-based image segmentation should be called object segmentation in fact. It aims to automatically assigning labels to the main object from original image. Saliency detection technique is the study result of visual attention of HVS, which aims to capture the most visually remarkable regions in an image. Visual saliency detection is an interdisciplinary subject which integrates cognitive psychology, neurology, mathematics, statistics, and so on. The result of saliency detection is called saliency map. For decades, many saliency methods have been proposed and applied to vision problems, including object segmentation [5], image retrieval [6], and image matching [7], to name a few. In these proposed saliency detection algorithms, CNN-based saliency methods [8, 9] have been prevailed, which can get high quality saliency detection result. But this kind of method has some disadvantages of complicated network structure and more runtime.

In this work, we use a novel saliency detection method to improve ICM, which can achieve better simulation of HVS to label the important object in an image. The goal of this work is to establish a saliency guided ICM for object segmentation without any training, we named it SG-ICM, in which saliency detection result become the external stimulus of ICM neurons rather than original input image. As an iterative model, ICM in this work use a novel strategy to speed up its convergence. The proposed method is compared with other competitive approaches both on subjective evaluation and objective evaluation. Experiments demonstrate its excellent performance of the proposed method in object segmentation.

This paper is structured as follows: Sect. 2 introduces the background knowledge include the standard ICM and saliency theory. Section 3, we briefly describe the proposed SG-ICM. The result analysis of the proposed method will be done in Sect. 4. At last, we conclude the whole work in Sect. 5.

2 Background Knowledge

2.1 The Standard ICM

Traditional ICM was presented in [4], in which each neuron consists of three parts: input field, modulation field, pulse generation. ICM is a simplified PCNN model for image processing, and each ICM neuron corresponds to an image pixel. Figure 1 shows the standard ICM structure which is widely used in image processing. The neurons with similar stimulation can pulse synchronously to form a segment of the input image. As shown in Fig. 1, each neuron consists of three parts: feeding and linking field, modulating field, and pulse generation. Neuron communicates with neighbor neurons through weight W. Its mathematic model can be described as below:

Fig. 1.
figure 1

The structure of standard ICM

$$ \begin{aligned} & F_{ij} [n + 1] = f\,F_{ij} [n] + S_{ij} + W_{ij} \{ Y\} \\ & Y_{ij} [n + 1] = \left\{ {\begin{array}{*{20}l} 1 \hfill & {F_{ij} [n + 1] > T_{ij} [n]} \hfill \\ 0 \hfill & {else} \hfill \\ \end{array} } \right. \\ & T_{ij} [n + 1] = gT_{ij} [n] + hY_{ij} [n + 1] \\ \end{aligned} $$
(1)

Where Sij is the ij-th pixel value of input image. Fij[n] is the status function of neurons, which can remain the neuron status. That is to say, the status function Fij[n] for each iterative is related to last one Fij[n − 1], which memory attenuates over time and the attenuation rate is influenced by attenuation factor f (f < 1). F, g, h are all scalar coefficients, and g < f < 1 will guarantee dynamic threshold will be less than status value of neuron at last. h is very large, so it can guarantee advance threshold after neuron firing, which not be inspired at the next iteration. Wij{} is connecting weight between neurons. Tij is dynamic threshold.

We can see standard ICM is a two dimensional structure, and it is only used for gray image. Study on some algorithms based ICM, it is clear that standard ICM cannot carry out multi-region image segmentation, and cannot deal with color image directly. In order to extend standard ICM applications and improve its efficiency, many scholars have proposed ICM-related methods [10]. In this work, we improve ICM based on a novel view, which is saliency-based ICM.

2.2 Saliency Detection

Saliency detection is a basic and complex technique in computer vision, which can guide computer to capture key information in an image according to human visual habit. When a pixel (or a region) in image is uniqueness, rarity, it may be salient. State-of-the-art saliency algorithms in general can be categorized as bottom-up and top-down approaches. The bottom-up approaches are data-driven and the other type methods are task-driven. A large number of novel methods are proposed based on the existing algorithms, such as Bayesian frameworks [11], ranking algorithms [12], differential equation [13], deep learning [14], etc.

How to highlight the salient object is still challenging problem, as shown in Fig. 2. The comparison algorithms used in the example are the latest or classical methods, including frequency method: FT [15], contrast method: RC [16], graph based method: GBVS [17], deep feature based method: ELD [18] and unsupervised method: UHM [19]. We can observe that pixels will get a higher gray value in the saliency maps when they are salient. The deep feature based method such as ELD can get the best accuracy and the highest recall rate, but this type of deep learning based methods needs a complex training process, which can affect the efficiency of algorithm. On the other hand, it is difficult to get training set for unknown environments detection. So we propose a novel unsupervised saliency method based on dynamic guided filtering in this paper.

Fig. 2.
figure 2

Saliency detection results (a) Original image (b) FT (c) RC (d) GBVS (e) UHM (f) ELD (g) GT (Ground Truth)

3 Methodology

3.1 SG-ICM

ICM is the result of research on the phenomenon of pulse-synchronous oscillation of mammalian visual cortical neurons, it comes from several visual cortex model, especially the Eckhorn model. At the same time, it absorbs advantages of other visual modes and is the product of cross-synthesis of various cerebral cortex models. It is developed based on the study of PCNN, but it is simpler than PCNN and is more suitable for object segmentation. More importantly, this model doesn’t require any training.

Clearly, although ICM can effectively realize the unsupervised image segmentation, it suffers from lots of iterations and parameter setting. Visual saliency method can make the prominent of every region or pixel in an image, and it has been used in many fields computer vision. Based on traditional ICM and visual saliency detection, we proposed SG-ICM, which is improvement of existing image segmentation methods. The proposed model is designed as follows:

$$ \begin{aligned} & F_{ij} [n + 1] = fF_{ij} [n] + Sal_{ij} \\ & Y_{ij} [n + 1] = \left\{ {\begin{array}{*{20}l} 1 \hfill & {F_{ij} [n + 1] > T_{ij} [n],n > n_{fire} } \hfill \\ 0 \hfill & {else\begin{array}{*{20}c} , & {n < n_{fire} } \\ \end{array} } \hfill \\ \end{array} } \right. \\ & T_{ij} [n + 1] = T_{ij} [n] + hY_{ij} [n] - T^{*} \\ & T^{*} = \left( {\hbox{max} \,Sal(i,j) - \hbox{min} \,Sal(i,j)} \right) /n \\ \end{aligned} $$
(2)

Where Salij is the ij-th pixel’s saliency, which is instead of the original image as external neuron input in the image. The saliency estimation in this work based on a pixel different level from its surroundings. Compared with the traditional ICM, the proposed model in this paper omit the natural linking term between neurons. In proposed model, all the parameters have the same definition as the traditional ICM. The novel model structure is presented in Fig. 3.

Fig. 3.
figure 3

The structure of SG-ICM

It can be noticed that proposed SG-ICM has simple structure and less parameters. Similar to the traditional ICM, we see each pixel in the saliency map as a neuron. In this novel model for object segmentation, only two parameters are preserved, namely, f and h, which maintains the main characteristics of traditional ICM and makes facilitate applications of the model. Compared with traditional ICM, the proposed SG-ICM updates the external neuron input, and can achieve better object segmentation results. At the same time, the output is accumulated, that is, the over-ignited neurons remain on fire all the time.

n is the iterative number. The traditional threshold function in ICM adopts the exponential attenuation mechanism which is more suitable for human vision. But the segmentation in this work only distinguishes objects from background, so when the pixel similarity between object and background or between different objects is poor. It will bring difficulties for segmentation, which is not conducive to implementation of segmentation algorithm. In this paper, we choose the linear decline to adjust the threshold:

$$ T^{*} = \left( {\hbox{max} \,Sal(i,j) - \hbox{min} \,Sal(i,j)} \right) /n $$
(3)

In addition, in order to make the saliency map play a positive role in image segmentation, it will be: 1. Salient region with larger value in saliency map should cover the most of the real object; 2. Salient region in saliency map should contain the most salient object, perhaps some background. 3. Salient object in saliency map should be the biggest in all saliency regions. Thus, our defined threshold would be closer to the average feature of the real object.

To analyze single neuron’s status, the math model denotes that is no linking matric between different neurons, when neurons ignite for the first time, n = 1 and internal activity can be expressed as:

$$ F_{ij} [n + 1] = fF_{ij} [n] + Sal_{ij} = (F_{ij} [n] - \frac{{Sal_{ij} }}{1 - f})f^{n} + \frac{{Sal_{ij} }}{1 - f} $$
(4)

In the proposed model, we add up the output results in each iteration, that is, when the neuron is fired, it will remain on firing during subsequent iterations.

3.2 Saliency Stimulus

In our work, we regard guided filtering analysis result as external neuron input, which is great improvement of traditional unsupervised saliency detection methods. In the following, we give the definition of saliency map.

As shown in Fig. 4, we model saliency detection as a dynamic guided filtering problem and proposed a two stage scheme for saliency detection. Figure 4 shows the main steps of the proposed saliency detection method. In the first stage, we exploit the iterative dynamic guided filtering to get the mainly structure of input image and remove irrelevance texture. Therefore, in the second stage, we exploit the boundaries of image and center regions prior to compute the coarse saliency of each pixel and highlight it to get the final saliency map [20, 21].

Fig. 4.
figure 4

Diagram of the proposed saliency detection method

Saliency detection aims to extract the most important regions of image according to human visual habit. The guidance image needs to get salient regions of image for us. As shows in Fig. 5, it is the dynamic filtering with various iterative steps based on guided filter. We can see guided filtering can remove the irrelevance texture, which may locate at the intra-object or intra-background. With the iteration increases, the detail in the image are gradually smoothed, at the same time, the main structure of image is preserved. Therefore, we use the dynamic guidance image to modulate input image for getting its main structure. This method is iterative and dynamic guidance image is updated at every step. The novel dynamic guided filtering kernel is given by:

Fig. 5.
figure 5

The dynamic filtering with various iterative steps (a) Original image (b)–(f) Iteration outputs of guided filtering

$$ W_{ij} (p,q^{t} ){ = }\frac{ 1}{{\left| \omega \right|^{{\mathbf{2}}} }}\sum\limits_{{{{k:(i,j)}} \in\varvec{\omega}_{{\mathbf{k}}} }} {\left( {1 + \frac{{(p_{i} - \mu_{k} )(q_{j}^{t} - \mu_{k} )}}{{\sigma_{k}^{2} + \varepsilon }}} \right)} $$
(5)

where qt is the t-th filtering output based on dynamic filter. Such as q1 is the 1-st filtering output based on dynamic filter. μk and σk are the variance and the mean of dynamic guidance image qt in window ωk, respectively.

The novel filtering kernel uses the joint structure information of input image and dynamic guidance image, the filtering output can preserve the main structure of input image and remove the noisy texture efficiently. The filtering output is given by:

$$ q_{i}^{t + 1} { = }\sum\nolimits_{j} {W_{ij} (p,q^{t} )p_{j} } $$
(6)

This novel iterative filter can also be the edge-preserving smoothing property like traditional guided filter. The guided filtering results are applied to find important pixels of image. If the pixels in image are salient, their saliency value should be high. Psychophysical study [21] shows that human attention favors center region. So pixels close to a natural image center could be salient in many cases. On the other hand, a natural image boundary could be background in many cases. With the constraints on boundary and center of image, we could get five coarse saliency results respectively.

$$ \begin{aligned} & S_{t} = \sum\limits_{{i \in \left\{ {L,a,b} \right\}}} {\left\| {c_{i} - m_{i} } \right\|_{2} } \\ & m = \frac{1}{\left| t \right|}\sum\limits_{{x \in \{ t\} }} x \\ \end{aligned} $$
(7)

Where St is coarse saliency map using top boundary of image. ci is the color vector of pixel c. m is average color of top boundary of image. Parameter t is the pixels set of top boundary. \( \left| t \right| \) is the pixels number of top boundary set.

Similarly, we compute the other four maps Sb, Sl, Sr, Sc, using the bottom, left, right image boundaries and center region of image. The five saliency maps are integrated by the following process:

$$ S = S_{t} \times S_{b} \times S_{l} \times S_{r} \times S_{c} $$
(8)

While most regions of the salient objects are detection in the above process, we get the coarse saliency maps, which may not be adequately highlighted. To improve the coarse results, we need to highlight the coarse results to make the salient object with higher saliency value.

Coarse saliency map which we have gotten is from five different saliency maps using Eq. (8), the integration process could weaken the background pixels. Because the five saliency maps are all normalized to the range [0,1], the integration will decrease the value of salient pixels. The integration process cannot highlight object region well. Figure 6(c) shows the saliency value of salient object in coarse saliency map is not adequately high. In this work, we need to highlight the salient pixels in coarse saliency map.

Fig. 6.
figure 6

Saliency map after highlighting (a) Input image (b) Filtering result (c) Integrating (d) highlighting

We define the final saliency map using the following function:

$$ SS = S^{{\prime }} \cdot \,\exp (\alpha S^{{\prime }} - 1_{{}} ) $$
(9)

where \( S^{\prime} \) is the normalized integration result. SS is the final saliency map. In practice, we found the coarse saliency map is not highlighting the whole salient region. Therefore we use an exponential function in order to emphasize salient pixels. In all experiments, we use α = 2 as the weight factor for the function. If \( S^{\prime} \) > 0.5, then the term \( \exp (\alpha S^{{\prime }} - 1_{{}} ) \) > 1. In other words, we magnify the pixel value of coarse saliency map when it is greater than 0.5, otherwise, reduction.

4 Experiments and Analysis

In our paper, experiments are consisted of four parts: the first part is comparing the proposed saliency method with several prior ones; The second part is doing segmentation experiment for color image; the third part is doing object segmentation for aerial image; the fourth part is doing the quantitative evaluation of segmentation accuracy.

4.1 Saliency Detection

We compare proposed saliency method with several prior ones, including contrast method: FT [15], SR [22], LC [23], HC [16], RC [16], GBVS [17], deep feature based method: ELD [18] and unsupervised method: UHM [19]. Figure 7 is saliency maps of above methods for four images from MSRA database. We can see our saliency method is superior in these comparison saliency detection algorithms. Figure 8 is the quantitative evaluation based on MSRA database. The PR-curve and F-measure express the same result that our method is the better saliency method than others.

Fig. 7.
figure 7

Saliency detection results of different methods (a) Input (b) FT (c) GBVS (d) LC (e) HC (f) RC (g) UHM (h) ELD (i) Our (j) GT

Fig. 8.
figure 8

Precision-recall curve and F-measure on MSRA-1000 database (a) PR curve (b) F-measure

4.2 Object Segmentation for Color Images

The proposed SG-ICM model introduces image saliency as stimulus for image segmentation. But the standard ICM is only used for gray image, and its model is a single layer two dimensional neural network. We calculate the image saliency based on superpixel and make it as the external neuron input in improved ICM. The color images are all selected randomly from MSRA database. The results show that the proposed method based saliency is more accuracy than other segmentation methods. This experiment is consisted of two parts:

  1. (1)

    The segmentation based on different saliency detection methods

    In this part, we use manifold ranking (MR) [24] based saliency detection as the compare method. The experiment is order to test affection of the proposed saliency method in object segmentation. Figure 9 shows the different object segmentation results based on different saliency detection methods in same segmentation model.

    Fig. 9.
    figure 9

    Object segmentation result based on different saliency methods (a) Input (b) Saliency map based on the proposed saliency algorithm (c) Segmentation result using saliency map (e) Object segmentation result (f) Saliency map using MR (g) Segmentation result using saliency map (h) Object segmentation result

  2. (2)

    The segmentation based on different segmentation methods

    Object segmentation method have been proposed, where Kmeans [25] is a popular segmentation method in image segmentation. In this part, we use the Kmeans as compare method, and make it based on same saliency detection method. So this experiment test the ICM model is more effective in image segmentation.

4.3 Object Segmentation for Aerial Images

For evaluating the effectiveness of the proposed algorithm, we select 100 aerial plane images for this experiment. Because of uneven illumination, low contrast or fast texture change, aerial images have brought many difficulties to image processing (Fig. 10). We use several different image segmentation algorithms to segment the same set of aerial plane images, include ICM, PCNN [26], SCM [27], OSTU [28], Maximum entropy [29] and two-dimension chi-square divergence [30] in Fig. 11. Compared with the neural network-based methods (PCNN and SCM), the input term of the proposed model is improved by saliency map in this paper, which makes the neurons in the salient region more likely to be motivated, so that the segmentation results can be obtained accurately.

Fig. 10.
figure 10

Comparison with different segmentation methods (a) Input (b) GT (c) Segmentation result using Kmeans (d) Object segmentation result (e) Segmentation result using Kmeans and saliency (g) Segmentation result using the proposed algorithm (h) Object segmentation result

Fig. 11.
figure 11

Segmentation results of different segmentation methods

4.4 Quantitative Evaluation of Segmentation Accuracy

In the experiment, three kinds of indexes were used to quantitatively evaluate for different algorithms. They are Misclassification Error (EM), Mean Misclassification Error (MME) and running time.

Firstly, we define ME as follow:

$$ ME = 1 - \frac{{\left| {B_{O} \cap B_{T} } \right| + \left| {F_{O} \cap F_{T} } \right|}}{{\left| {B_{O} } \right| + \left| {F_{O} } \right|}} $$
(10)

Where \( B_{o} \), \( F_{o} \) express the real background and foreground of image, respectively. is the background and foreground of segmentation image. Table 1 shows that the ME of each image from Fig. 11, and we can get the analysis result similar to segmentation result based on the data. For testing the effectiveness of the proposed algorithm, we are calculating MME based on test image database in Table 2, and MME of the proposed algorithm is the minimum value. Table 3 show the running time based on image database.

Table 1. ME for different images
Table 2. MME for test image database
Table 3. Mean running time for test image database

Based on Tables 2 and 3, we can see that the MME of the proposed algorithm is close to that of PCNN, but the computational complexity of the PCNN model is too high. The running time of chi-square divergence has the most advantage in comparison algorithm, unfortunately, its MME is too high to have good segmentation performance.

5 Conclusion

In this paper, Superpixel-based saliency guided intersecting cortical model for unsupervised object segmentation algorithm is proposed. The saliency map enables us not to calculate the global threshold of image segmentation, which facilitates the removal of unnecessary background interference in low-contrast aerial images and achieves more accurate object segmentation. It can provide reliable data quickly for object follow-up location and tracking, and can also realize automatic pre-analysis of data.