Keywords

1 Introduction

Saliency detection plays an important role in the field of computer vision, which aims to identify the most attractive region in an image. Its important applications include image segmentation [12], target recognition [10] and image retrieval [13]. In order to extract salient object from input image, many visual attention models have been proposed [4, 6,7,8,9, 11].

Since Itti et al. [7] proposed a visual attention system inspired by animal visual system, the researches on saliency detection have been developed rapidly. Ma and Zhang [9] designed a saliency detection method based on local contrast. Harel et al. [11] gave a graph-based visual saliency model. Cheng et al. [4, 6] proposed a saliency detection method based on global contrast and space coherence. Li et al. [8] employed hypergraphs to capture contextual attributes for saliency detection.

The contextual hypergraph modeling method for saliency detection (CHMS) [8] has good performance of capturing salient objects in most images, which has higher accuracy than many previous methods. Through the experiments, we find that, for the images with wide ranges of pixel-values (e.g. covering almost the whole range of [0, 255]), the CHMS method always has good performance, as shown in the top three lines in Fig. 1. However, for the images with narrow ranges of pixel-values (e.g. covering only the first or middle or last part of the range [0, 255]), the CHMS method usually cannot get very good performance, as shown in the bottom three lines in Fig. 1. Further experiments show that hypergraphs on different scales affect experimental results directly. For example, the images in the buttom three lines in Fig. 1 have no sufficient valid single-scale hypergraphs, so the experiments cannot give good performance by the CHMS method.

Fig. 1.
figure 1

The experimental results by the CHMS method. The top three lines are for the images with wide ranges of pixel-values, and the bottom three lines are for the images with narrow ranges of pixel-values. (a) The input images; (b) The ground truth; (c) The saliency maps by the CHMS method; (d)–(j) The hypergraphs on a fixed scale in [0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75]. From the experimental results, we can find that the CHMS method has good performance for the images in the top three lines and bad performance for the images in the bottom three lines.

Inspired by this, we propose an improved saliency detection method using hypergraphs on adaptive multiscales (HAM), which is an improvement on the CHMS method. The HAM method adaptively adjusts the ranges of pixel-values in RGB channels and detects salient objects on adaptive multiscales. As shown in Fig. 2, the HAM method provides more valid single-scale hypergraphs and better final saliency maps. The experimental results show that the HAM method improves the performance of saliency detection, especially for the images with narrow ranges of pixel-values. The basic idea of adjusting adaptively the ranges of pixel-values in an image can be widely used in other applications in computer vision or artificial intelligence.

Fig. 2.
figure 2

The experimental results using the HAM method for the images with narrow ranges of pixel-values. (a) The input images with narrow range of pixel-values; (b) The ground truth; (c) The saliency maps; (d)–(j) The single-scale hypergraphs on adpative scales.

2 Contextual Hypergraph Modeling

The CHMS method uses a set of hyperedges to capture the contextual properties of superpixels, which improves significantly the performance of saliency detection. The CHMS method firstly segments an input image into some superpixels by means of the SLIC method [2], which applies k-means clustering method to generate superpixels efficiently. And then, it clusters superpixels by means of agglomerative mean-shift [1] on a set of fixed scales, which is based on an iterative query set compression mechanism and runs in linear time. Thus, an input image I is converted to a set of hypergraphs. Each hypergraph is denoted by \(G = (V, E)\), where \(V = \{v_i\}\) is a set of vertices corresponding to the superpixels, and \(E = \{e_j\}\) is a set of hyperedges (a hyperedge is a clique of superpixels) that satisfy \(\bigcup _{{e_j}\in {E}} {e_j=V}\). Thus, the saliency detection problem is converted to that of finding salient vertices and hyperedges in hypergraphs on adaptive multiscales.

The saliency of a hyperedge can be determined by the gradient magnitudes of the superpixels within a narrow band along the boundary of the hyperedge. For any hyperedge \(e_j\), the saliency score of \(e_j\) is defined as follows.

$$\begin{aligned}&\varGamma (e_j)=\omega _{e_j}({{\Vert {I_g^{*}}\bullet {M_g(e_j)}\Vert }_1}-\rho (e_j)), \end{aligned}$$
(1)

where \(\omega _{e_j}\) is the weight of \(e_j\), \(I_g^{*}\) is the binary gradient map, \(M_g(e_j)\) is a binary mask indicating the superpixels within a narrow band along the boundary of the hyperedge \(e_j\), \(\bullet \) is the elementwise dot product operator, \({\Vert ~\Vert }_1\) is the 1-norm, and \(\rho (e_j)\) is a penalty factor that is equal to the number of the superpixels in the intersection of the hyperedge \(e_j\) and the boundary superpixels of the input image.

The saliency of a vertex (i.e. a superpixel) is associated to the superpixel and its contexts. For any hyperpixel \(v_i\) in a hypergraph, the saliency score of \(v_i\) is defined as follows.

$$\begin{aligned}&HSa(v_i)=\sum _{{e_j}\in E}\varGamma (e_j)h(v_i,e_j), \end{aligned}$$
(2)

where \(\varGamma (e_j)\) is the saliency score of \(e_j\), and \(h(v_i,e_j)\) is the element value in the \(i^{th}\) row and \(j^{th}\) column in the incidence matrix H. If \(v_i\in e_j\), then \(h(v_i,e_j) = 1\); otherwise \(h(v_i,e_j) = 0\).

3 Saliency Detection Using Hypergraphs on Adaptive Multiscales

The CHMS method uses a set of empirical values [0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75] as the fixed scales to detect salient objects in an image, which has good performance for most images. It is widely acecepted that there are great differences between different images. If all the pixel-values in an image lie in the first or middle or last part of the range [0, 255], there may be no hyperedges for some scales which result in bad performance of saliency detection, as shown in the last three lines in Fig. 1.

We propose an improved saliency detection method using hypergraphs on adaptive multiscales (HAM). Different from the CHMS method, our HAM method uses a set of adaptive scales instead of fixed scales. Thus, it can get more single-scale hypergraphs than the previous ones for an input image, which results in higher accuracy.

3.1 Adaptive Scaling of Pixel-Values

In order to adaptively adjust the ranges of pixel-values in RGB channels, we count for the pixel-values in each channel by means of histogram. The statistical results are used to determine the range of pixel-values covering more than 95% pixels in each channel. Selecting the range of covering more than 95% pixels is to avoid the influence of outliers [6]. The pixel-values outside the range are replaced by the nearest pixel-values. And then, all the pixel-values are normalized and remapped to the range [0, 255]. For each channel i, \(i\in \{R,G,B\}\), the formula for normalization and remapping is as follows.

$$\begin{aligned}&\left( {\left( {\frac{I_i-{low}_i}{{high}_i-{low}_i}}\right) ^\gamma }\bullet ({high}_{i,out}-{low}_{i,out}) \right) +{low}_{i,out}, \end{aligned}$$
(3)

where \(I_i\) is a pixel value in channel i in the input image I, \({low}_i\) is the lower bound of pixel-values in channel i, \({high}_i\) is the upper bounder of pixel-values in channel i, \({low}_{i,out}\) is the lower bound of the remapped values in channel i, \({high}_{i,out}\) is the upper bound of the remapped values in channel i, and \(\gamma \) indicates the shape of curve. Here, \({low}_{i,out} = 0\), \({high}_{i,out} = 255\), and \(\gamma = 1\) representing a linear mapping.

figure a

3.2 Construction of Hypergraphs on Adaptive Scales

The CHMS method constructs hypergraphs on a set of fixed scales by means of the Alggo-MS method. For the details, refer to [14]. Different from the CHMS method, we use a set of adaptive scales instead of fixed scales. We first adjust adaptively the ranges of pixel-values in RGB channels in an image to get three remapped ranges of pixel-values, and then combine the remapped pixel-values with a set of fixed scales. The results of the two operations are equivalent to that of the original image with a set of adaptive scales. Thus, in the HAM method, we use hypergraphs on a set of adaptive scales to detect salient objects. The distance between any pixels \(x_i\) and \(x_j\) on an adaptive scale \(\mu \) is computed as follows.

$$\begin{aligned}&dist = \left( {\frac{x_i-x_j}{\mu }}\right) ^2. \end{aligned}$$
(4)

The adaptive scaling of pixel-values in an image can result in more single-scale hypergraphs for the images with narrow ranges of RGB pixel-values (covering only the first or middle or last part of the range [0, 255]). Thus, the performance of saliency detection has been improved significantly by means of the HAM method. Take the three images shown in the 1st column of Fig. 2 as examples. When using the HAM method to detect salient objects, the number of single-scale hypergraphs is 7, 7, 6, respectively, as shown in Fig. 2; when using the CHMS method, the number of single-scale hypergraphs is 3, 2, 2, respectively, as shown in the last three lines in Fig. 1. By comparing the results in Fig. 2 with the last three lines in Fig. 1, it can be concluded that the HAM method has better performance than the CHMS method for the images with narrow ranges of pixels values.

From more experimental results, we find that, for the images with narrow ranges of pixel-values (all the pixel-values lie in the first or middle or last part of the range [0, 255]), the number of single-scale hypergraphs obtained by the HAM method is always greater than the number of single-scale hypergraphs obtained by the CHMS method; for the images with wide ranges of pixel-values, the two methods always get similar number of single-scale hypergraphs.

3.3 Saliency Detection Using Hypergraph on Adaptive Multiscales

The CHMS method uses hypergraphs on a set of fixed scales, no considering the color difference of different input images. For any input image, it takes the empirical values [0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75] as the fixed scales. In order to detect salient objects according to the color difference in an image [15], we make an adaptive scaling of pixel-values and use hypergraphs on a set of adaptive scales to detect salient objects. The HAM method improves signifycantly the performance of saliency detection for the images with narrow ranges of RGB pixel-values.

figure b

In step 5, the saliency map based on LS-SVM is obtained from one alternative method in the CHMS method [8]. In order to achieve a fair comparison, we also use this method. For further details, refer to [8].

The HAM method is an improvement on the CHMS method. The comparisons between the HAM method and the CHMS method are illustrated in Fig. 3 (More experimental results are shown in Sect. 4). Consider the images in the \(1^{st}\) line in Fig. 3, there are less noises in the saliency map obtained by the HAM method than that by the CHMS method, and the boundaries of the CAUTION banner are smoother in the saliency map obtained by the HAM method than that by the CHMS method. For the images in the \(2^{nd}\) line in Fig. 3, the background around the dog is usually marked as a part of salient object by the previous methods including the CHMS method because the dog has the similar color features with the background. The HAM method can detect the salient object better than the previous ones. For the images in the \(3^{rd}\) line in Fig. 3, the remarkable object is a flower, but the green leaves are marked as salient object by the CHMS method due to their large size; while the HAM method highlights the flower and darkens the background which achieves better performance than the CHMS method. The experimental results show that the HAM method can better capture salient objects and improve the performance of saliency detection.

Fig. 3.
figure 3

Three examples for the comparisons between the CHMS method and the HAM method. (a) The input images; (b) The saliency map obtained by the CHMS method; (c) The saliency map obtained by the HAM method.

4 Experimental Results and Analysis

The HAM method is compared with six state-of-the-art saliency detection methods including CHMS [8], GC [5], RC [15], MSS [3], FT [1], LC [16] on the public datasets MSRA-1000 and SED-100, which are used as standard benchmarks. In addition, we use a new dataset, marked as IMNR, to evaluate the HAM method. Images in the IMNR dataset is with narrow ranges of RGB pixel-values. Some experimental results are shown in Fig. 4. The experimental results show that the HAM method is the most competitive one in all the seven methods.

Fig. 4.
figure 4

Comparisons between our HAM method and the six state-of-the-art methods includeing CHMS [8], GC [5], RC [15], MSS [3], FT [1] and LC [16] on the three datasets: MSRA-1000 (top six rows), SED-100 (middle two rows) and IMNR (bottom two rows).

We use the precision-recall (PR) curves and F-measures to evaluate the quantitative performances of the six state-of-the-art methods and the HAM method. The F-measures is computed as follows.

$$\begin{aligned}&F_\beta = \frac{(1+\beta ^2){P}\times {R}}{{\beta ^2}P+R}, \end{aligned}$$
(5)

where P is the precision rate, and R is the recall rate. Let \(\beta ^2 = 0.3\).

The quantitative performances of all the seven methods on the IMNR dataset are shown in Fig. 5. From the PR curves and F-measures on the IMNR dataset, the HAM method has the best performance on the IMNR dataset in all the seven methods. As you can see from Fig. 5, the HAM method always has better performance than the CHMS method. When the recall rate is greater than 0.7, the HAM method is better than all the six state-of-the-art methods. Furthermore, the saliency map obtained by the HAM method is more smooth and robust than other six state-of-the-art methods.

Fig. 5.
figure 5

The quantitative performance of the HAM method and the six state-of-the-art methods on the IMNR dataset.

Fig. 6.
figure 6

The quantitative performances of the HAM method and the six state-of-the-art methods on the MSRA-1000 dataset.

Fig. 7.
figure 7

The quantitative performance of the CHMS and HAM methods on the SED-100 dataset.

The quantitative performances of all the seven methods on the MSRA-1000 dataset are shown in Fig. 6. From the PR curves and F-measures on MSRA-1000, the HAM method also has the best performance in all the seven methods but its advantages are not obvious since the most images in MSRA-1000 are with wide ranges of pixel-values and seldom of them are with narrow ranges of pixel-values.

The quantitative performances of the CHMS and HAM methods on the SED-100 dataset are shown in Fig. 7. From the PR curves and F-measures, the HAM method consistently performs better than the CHMS method.

5 Conclusion

The previous saliency detection methods based on hypergraphs uses a set of fixed scales, no considering the color difference of different input images. In order to detect salient objects according to the color difference in an image, we propose an improved saliency detection method baserd on hypergraphs on adaptive multiscales (HAM). Our HAM method adaptively adjusts the ranges of pixel-values in RGB channels, and uses hypergraphs on a set of adaptive scales to detect salient objects. The experimental results show that our HAM method improves significantly the performance of saliency detection for the images with narrow ranges of RGB pixel-values. For the images with wide ranges of pixel-values, the performance is also improved to some extent. The basic idea of our HAM method can be widely used in other applications in computer vision or artificial intelligence.