Keywords

1 Introduction

Image saliency detection has become a research hotspot in the field of computer vision. It filters out redundant visual information of an image by imitating the human visual attention mechanism, and selects objects of interest to human eyes, which are called salient objects. Efficient saliency detection models have been applied to numerous computer vision scenarios, such as image classification [1], object detection [2], image retrieval [3], and so forth.

Corresponding to the visual attention mechanism, existing saliency detection models can be broadly categorized as bottom-up [4,5,6], or top-down approaches [7, 8]. Since saliency detection based on bottom-up models which mainly focus on color, intensity, contrast and other low-level visual features of the image, the detection speed is fast. The top-down approaches add a higher level of prior knowledge, which is more complex. In this paper, our work focus on bottom-up salient detection models. Many salient detection models use low-level features such as color and contrast [9, 10] of the image. They perform well in many situations, but still struggle in complex images. Based on the assumption that saliency objects are mostly located in the center of the image, some researchers regard the image boundary region as the background and propose background-prior saliency detection models [11, 12]. They perform well in many cases, except for salient objects that appearing at the edge of the image. Since the background template of the image contains the foreground noise, the detection result has a poor performance.

In this paper, we propose a novel saliency detection method based on foreground and background propagation. First, we obtain a background template by selecting the image border superpixels, which have been proved to be good indicators for background-prior in saliency detection [13, 14]. Second, we calculate the color and spatial distances between each superpixel and the background labels to obtain the background-prior map. Third, we calculate the convex hull of the image, and obtain the foreground-prior map by computing the difference of superpixels between the inner and outer of the convex hulls. Finally, we use label propagation algorithm to propagate saliency information based on foreground-prior and background-prior maps. The saliency result is obtained by integrating two propagated maps.

2 Related Work

Significant progress has been made in visual saliency detection in recent years. A quantitative analysis of different saliency methods can be found in [15]. Most bottom-up saliency detection methods based on contrast and background. It is has been verified that contrast is an effective cue for satisfying results. Furthermore, contrast-based methods can be roughly divided into local and global contrast-based techniques. The model of Itti et al. [4] is one of the typical local contrast-based methods. They define saliency by calculating center-surround differences in color, density, and orientation from images. Harel et al. [16] improved the Itti’s method and added Markov chains to the calculation of saliency maps. Achanta et al. [17] proposed a multi-scale contrast saliency detection method by calculating contrasts of feature vectors between the inner and outer regions of a sliding window. Hou et al. [18] proposed a frequency-based method, which uses the spectral residual in the frequency domain to extract saliency regions of the image. Global contrast-based approaches use contrast relationships of the whole image to calculate the saliency of single pixels or image regions. Goferman et al. [6] proposed a context-aware saliency detection method that computes the average of salient values at multiple scales to obtain saliency regions. Achanta et al. [19] propose a frequency-tuned method that directly defines saliency as the differences of image color distances. Cheng et al. [5] proposed a saliency detection method based on color contrast and spatial position features, using color contrast weighted by spatial distance to define the salient regions.

Background-based methods typically regard image border pixels as background labels, which calculate saliency by background labels query or propagation mechanism. Yang et al. [20] proposed a saliency detection algorithm based on manifold ranking, which ranks the similarity of the image’s superpixels with the background labels on a graph to obtain saliency maps. Li et al. [21] regard boundary superpixels as background templates, and obtain salient results by constructing sparse and dense errors. Wang et al. [22] proposed a method based on foreground and background seed selection, which uses image border superpixels to generate foreground and background saliency maps. Zhu et al. [23] proposed a saliency detection method based on boundary connectivity, which characterizes the spatial layout of image regions. Zhang et al. [24] proposed an approach based on local structure propagation, which updates saliency values under the guidance of local propagation.

3 Proposed Method

In this section, we present an efficient and effective saliency detection method that integrates foreground and background information of the image, as shown in Fig. 1. We first abstract the image into superpixels using simple linear iterative clustering (SLIC for short). Then, we compute background-prior map by selecting boundary labels, and use convex hulls to generate foreground-prior map. Finally, we use label propagation saliency (LPS for short) respectively on foreground-prior and background-prior map, and integrate the two propagated saliency maps to generate a pixel-wise saliency map.

Fig. 1.
figure 1

Flowchart of the proposed method. It contains three stages: abstracting the image; computing background-prior map and foreground-prior map; generating a pixel-wise saliency map that integrates two propagated saliency maps.

3.1 Background Labels Selection

The strategy of selecting background labels affects the precision of the salient map. Based on the observation of saliency models that the object is likely to appear at the center of an image, the border near the image center is selected as the background labels. However, this method may contain foreground noise, which affects the results of saliency detection. Therefore, we propose a mechanism based on image boundary information to remove the foreground noises and select background labels from the border superpixels.

We first use SLIC algorithm to abstract the image into N uniform and compact regions. Then we select the superpixels whose centroids locate within a certain number of pixels as image background template. Since the most obvious boundary of the image is likely to be the contour between the object and background, we can roughly remove the image superpixels with strong boundaries, which are regarded as the foreground noises.

We adopt the probability of boundary (PB for short) [25] to detect image boundary. The boundary feature of the i-th superpixel is calculated from the average PB value of the pixel along the edge contour of the superpixel i, as follows:

$$ PB_{i} = \frac{1}{{\left| {B_{i} } \right|}}\sum\nolimits_{{I \in B_{i} }} {I^{pb} } $$
(1)

Where \( B_{i} \) is the edge pixel set of superpixel i and \( \left| {B_{i} } \right| \) is the number of the template. \( I^{pb} \) is the PB value of pixel \( I \). The larger the PB value, the more obvious the boundary feature of the superpixel. Since superpixels with obvious boundary feature is more likely to be the object, we remove the superpixels whose boundary features are larger than the adaptive gray threshold derived by Otsu [26]. Then the remaining superpixels in the background template are selected as background labels. As shown in Fig. 2, the selected background labels have less foreground noise than the background template.

Fig. 2.
figure 2

Illustration of the main phases of our method. (a) Input image. (b) Superpixels generated by SLIC. (c) The convex hull. (d) The prior map based on background template. (e) The prior map based on selected background labels. (f) The foreground-prior map.

3.2 Background-Prior Saliency

If a superpixel has larger color differences to the background labels, it is more likely to be a salient superpixel. In addition, the background labels contribute more for the salient value of the closer superpixels while less to a farther one. Therefore, we use spatial weight based color contrast to define a salient superpixel \( p_{i} \), as follow:

$$ S_{i} = \sum\nolimits_{i = 1,i \ne j}^{N} {d_{c} \left( {p_{i} ,p_{j} } \right)} \times { \exp }\left( {1 - d_{s} \left( {p_{i} ,p_{j} } \right)} \right) $$
(2)

Where \( d_{c} \left( {p_{i} ,p_{j} } \right) \) and \( d_{s} \left( {p_{i} ,p_{j} } \right) \) are respectively the Euclidean color and spatial distances between the i-th superpixel and the j-the superpixel which belongs to the background template. Both distances are normalized to [0, 1].

3.3 Foreground-Prior Saliency

We use image corner points to select foreground seeds. Traditional Harris point detection algorithm only considers the gray information of the image, which leads to more invalid corner or contour points being detected in complex cases. Since the color boosted Harris detection algorithm combines brightness information and color information to make most points locate around salient objects, the detection result is more stable. Therefore, in this paper we use Harris point detection algorithm [27] to detect corners or contour points of salient objects in the image.

Since salient points are usually not located in image boundary, we eliminate those near the image boundary, and enclose all the remaining salient points to compute a convex hull. Then we select the foreground seeds based on the difference of the superpixels between the inner and outer of the convex hull. We define the superpixels set in the convex hull as \( I \), and the superpixels set outside the convex hull as \( O \). The difference of superpixels between the inner and outer of the convex hull is defined as follow:

$$ {\text{w}}\left( {p_{i} ,p_{j} } \right) = \exp \left( { - \frac{1}{{2\sigma^{2} }}\left| {\left| {p_{i} - p_{j} } \right|} \right|^{2} } \right),i \in I $$
(3)
$$ d_{i} = \sum\nolimits_{i = 1} {{\text{w}}\left( {p_{i} ,p_{j} } \right)} \cdot \left| {\left| {c_{i} - c_{j} } \right|} \right|,i \in I,j \in O $$
(4)

Where \( \left| {\left| {c_{i} - c_{j} } \right|} \right| \) and \( {\text{w}}\left( {p_{i} ,p_{j} } \right) \) are respectively the Euclidean color and spatial distances between the j-th superpixel and the i-th superpixel which belongs to the convex hull. Both distances are normalized to [0, 1]. We remove the superpixels in the convex hull that are less than the average difference and regard the remaining superpixels as foreground seeds. The average difference is defined as \( \upalpha = \frac{1}{I}\sum\nolimits_{j = 1} {d_{j} } ,j \in I \), where \( \left| I \right| \) is the number of superpixels in a convex hull. Figure 2 shows the foreground-prior map of our method.

3.4 Graph Construction

We create a graph \( {\text{G}} = \left( {{\text{V}},{\text{E}}} \right) \) with N nodes \( \left\{ {n_{1} ,n_{2} , \cdots ,n_{N} } \right\} \) and edges \( {\text{E}} \). Node \( n_{i} \) corresponds to the i-th image superpixel or patch and edge \( e_{ij} \) link nodes \( n_{i} \) and \( n_{j} \) to each other. The similarity of two nodes is measured by a defined distance of the mean color features in each region. We define \( w_{ij} \) as the weight of the edge between node \( n_{i} \) and node \( n_{j} \). The affinity matrix \( {\text{W}} = \left[ {w_{ij} } \right]_{N \times N} \) indicate the similarity between superpixels:

$$ w_{ij} = \left\{ {\begin{array}{*{20}l} {\text{exp}\left( { - \frac{{||c_{i} - c_{j} ||_{2} }}{{\sigma^{2} }}} \right)} \hfill \\ {0, \,otherwise} \hfill \\ \end{array} } \right.,j \in N\left( i \right) $$
(5)

Where \( N\left( i \right) \) indicates the set of the neighboring nodes of superpixel i, \( ||c_{i} - c_{j} ||_{2} \) represents the average color distance of two superpixels on the CIE LAB color space. \( \sigma^{2} \) is a tuning parameter, which controls strength of the similarity. The degree matrix of graph \( {\text{G}} \). \( {\text{D}} = {\text{diag}}\left\{ {d_{1} ,d_{2} , \cdots ,d_{N} } \right\} \), where \( d_{i} = \sum\nolimits_{j} {w_{ij} } \) is the degree of node i, and a row-normalized affinity matrix:

$$ {\text{A}} = D^{ - 1} \times W $$
(6)

3.5 Label Propagation Saliency

In [28], a label propagation that get information about unlabeled nodes based on the provided labelled nodes is proposed. Given a data set \( X = \left\{ {x_{1} , \cdots ,x_{i} ,x_{i + 1} , \cdots ,x_{n} } \right\} \in R^{m \times n} \), the former \( {\text{i}} \). ta points are labelled and the rest need to be propagated according to their relevance to the labelled points. We seek out a function \( V = \left[ {V_{{m_{1} }} ,V_{{m_{2} }} , \cdots ,V_{{m_{N} }} } \right]^{T} \) such that \( V:{\text{R}} \to \left[ {0,1} \right] \in R^{N \times 1} \) indicates the possibility of how similar each data point is to the labels. The similarity measure \( V\left( {r_{i} } \right) \) satisfies:

$$ V_{t + 1} \left( {r_{i} } \right) = \sum\nolimits_{j = 1}^{N} {w_{ij} V_{t} \left( {r_{i} } \right)} $$
(7)

Where, \( w_{ij} \) is the affinity entry defined in Eq. 5 and t is the recursion step. The similarity measure of the labeled tag in the recursive process is fixed to 1, and the initial measure of the unlabeled objects is set to 0. The final similarity of the region to the label is influenced by the features of the surroundings. In other words, the similarity \( V\left( {r_{i} } \right) \) is iteratively learned by the propagation of the similarity measure of its neighbor \( V\left( {r_{j} } \right) \).

Then, we integrate the two saliency maps: the foreground-propagated based one which highlight the whole object and the one based on background propagation which reduce the background noses, as follow:

$$ {\text{S}} = S_{f} + S_{b} $$
(8)

Where \( S_{f} \) is a foreground-propagated saliency map, \( S_{b} \) is a background-propagated saliency map.

4 Experimental Results

In this section, we evaluate our method on two public available data sets. One is MSRA [19] data set that contains 1000 images, which equipped with pixel-wise ground truth. The other is ECSSD [29] data set which contains 1000 natural images under complex background.

We compare the performance of the proposed method with other 7 state-of-the-art methods: IT [4], FT [19], BSCA [12], GBMR [20], BFS [22], DSR [21], and LPS [28]. The evaluation standards are precision and recall (PR) curve and F-measure. We segment the saliency map using a threshold between 0 and 255 for each method, then compare the binary map obtained by different thresholds with ground truth to obtain the precision and recall curve. F-measure is used to comprehensively evaluate the performance of recall and precision, we set \( \beta^{2} = 0.3 \) [5] in order to emphasize precision:

$$ F - Measure = \frac{{\left( {1 + \beta^{2} } \right) \cdot Precision \cdot Recall}}{{\beta^{2} \cdot Precision + Recall}} $$
(9)

Figure 3 shows the precision and recall curve (PR curve for short) and F-measure comparison results of our method with other methods on MSRA-1000 data set and ECSSD data set. For the MSRA-1000 data set, the background of images are relatively simple. The precision and recall curve obtained by our method is obviously superior to other saliency detection algorithms. Although LPS and DSR methods also achieved better results in precision and recall curve and F-measure, the value of F-measure in our proposed method are 1.7% and 2.1% higher than LPS and DSR methods respectively. For the ECSSD data set, the background of images are more complicated. Although our method and LPS have achieved good results, the precision and recall of our method are 77% and 68% respectively, higher than the LPS algorithm with 71% precision and 64% recall.

Fig. 3.
figure 3

Quantitative PR-curve and F-measure evaluation of different methods on two data sets: (a) MSRA, (b) ECSSD

Figure 4 shows several images for visual comparison to previously published methods. From these samples, we can see that our method achieves the best performance on these images. Most saliency methods can detect the complete saliency objects when the background is relatively simple. When the background of the image is more complicated, the results of other saliency methods either contain background noise or are incomplete. Since we consider both foreground and background information, our method can effectively suppress background noises. With the help of the label propagation, our method can assign high salient values to candidate objects based on the differences between the labels. Furthermore, our experimental results are closer to Ground Truth, and conserve a more complete boundary of the salient objects.

Fig. 4.
figure 4

Visual comparison of our method with seven state-of-the-art methods. From left to right: (a) Input image, (b) Ground Truth, (c) IT [4], (d) FT [19], (e) GBMR [20], (f) BFS [22], (g) DSR [21], (h) BSCA [12], (i) LPS [28], (j) Ours.

5 Conclusion

In this paper, we propose a novel saliency detection algorithm based on foreground and background propagation. First, we select the image border superpixels to obtain background information and calculate a background-prior map. Second, we use salient points detected by the color boosted Harris algorithm to obtain a convex hull and compute a foreground-prior map. Third, we use label propagation algorithm to propagate saliency on the two prior saliency maps respectively. The final saliency map is obtained by integrating the foreground and background propagated saliency map. Results on two benchmark data sets show that our methods achieve superior performance compared with the state-of-the-art methods.