Bagging-based saliency distribution learning for visual saliency detection

https://doi.org/10.1016/j.image.2020.115928Get rights and content

Highlights

  • We propose a novel bagging-based saliency distribution learning framework for visual saliency detection.

  • We propose a so called foreground consistency saliency optimization framework to further refine saliency map.

  • An effective prejudgment mechanism is developed to improve computational efficiency.

  • Experimental results on four datasets indicate the effectiveness of the proposed method.

Abstract

Saliency detection is still very challenging in computer vision and image processing. In this paper, we propose a novel visual saliency detection framework via bagging-based saliency distribution learning (BSDL). Given an input image, we firstly segment it into superpixels as basic units. Then two prior knowledge containing background prior and center prior are integrated to generate an initial prior map, which is used to select training samples from all superpixels to train the BSDL model. Specifically, the BSDL contains two stages: In the first stage, we use bagging-based sampling method to train K saliency classifiers from all training samples. K saliency classifiers are used to predict each superpixel saliency value. In the second stage, we aim to learn a saliency distribution model, whose goal is to infer the relationship between each classifier and each superpixel. i.e., for each superpixel, the BSDL not only trains K saliency classifiers to predict its saliency value, but also infers the reliability of using each saliency classifier to predict its saliency value. As a result, each superpixel’s saliency value is determined by its K prediction saliency values and saliency distribution. After the BSDL, we propose a so called foreground consistency saliency optimization framework (FCSO) to further refine saliency map obtained by BSDL. To improve computation efficiency, a prejudgment rule is proposed to evaluate the quality of saliency map obtained by BSDL, which is used to decide whether the FCSO is needed for input image. Experimental results on four public datasets demonstrate the superiority of the proposed method than other state-of-the-art methods.

Introduction

Saliency detection is still an unsolved problem in computer vision and image processing tasks. It aims to locate the most interesting regions in an image. Thus, saliency detection contributes to subsequent computer vision and image processing tasks, such as image retrieval [1], action recognition [2], image segmentation [3], video saliency detection [4], [5], [6], [7] so on. In summary, state-of-the-art saliency detection methods focus on two strategies: top-down [8], [9], [10], [11], [12], [13] and bottom-up [14], [15], [16], [17], [18], [19], [20], [21].

Top-down methods are usually driven by specific tasks and involve supervised learning framework. They aim to learn a saliency model from numerous training images with the ground truth, deep learning based methods are the most popular top-down methods, they have achieved promising performances in recent years. Owning to their hierarchical architecture, deep neural networks can exploit effectively high-level semantic information from training images. In contrast, bottom-up methods are faster and simpler than top-down ones because training images are not needed. They mainly exploit various low-level features such as color feature, texture feature, gradient feature, and various prior knowledge such as background prior, center prior and contrast prior. Furthermore, machine learning algorithms are applied widely to bottom-up methods (MLBU), such as bootstrap learning [22], multiple instance learning [23], Bayesian framework [24] and so on. The flow of these MLBU methods is summarized as follows: Given an input image, they firstly utilize prior knowledge to select some regions from input image as training samples, based on various machine learning algorithms, the selected training samples are utilized to train saliency model to classify the each region of input image into foreground/background.

Nevertheless, above methods are very powerless when image content is very complex. i.e., for a complex image, it is hard to train a unified saliency model to classify each region into foreground or background, because they ignore the different characteristics of various regions in complex images. Such as Fig. 1(a), salient region A has a great feature difference to another salient region B but has similar feature with background region C. It means that we are difficult to train a unified saliency model for image Fig. 1(a) to separate salient region and background, because various regions’ features are very rich. The same situation also occurs in Fig. 1(b). Thus, for complex images where various regions have rich features, how to train an effective saliency model to separate foreground and background is a very challenging but important issue in MLBU methods.

To deal with above problems, we propose a novel visual saliency detection framework via bagging-based saliency distribution learning (BSDL). In our method, input image is firstly segmented into superpixels as basic units (a superpixel represents a region), each superpixel is represented by deep features extracted from pre-trained VGG19 net [25]. Then two well-known prior knowledge containing background prior and center prior are integrated to generate an initial prior map, which help to select superpixels from input image as training samples by setting adaptive threshold. Secondly, training samples are utilized to train the BSDL model which contains two stages: (1) To improve the generalization ability of saliency model, we use bagging-based sampling method to train K saliency classifiers for input image, i.e., we select randomly a subset of all training samples as training set in each classifier training, a saliency classifier corresponds to a training set. (2) Furthermore, we propose a saliency distribution learning method to infer the reliability of using each saliency classifier to predict each superpixel saliency value. i.e., In the BSDL, for certain superpixel, we not only train K classifiers to give its K prediction saliency values but also learn its saliency distribution which is used to infer the reliability of using each classifier to predict its saliency value. So, each superpixel’s saliency value is determined by its K prediction saliency values and its saliency distribution. Comparing with previous works, the BSDL firstly constructs K saliency classifiers for input image and then learns to find the most appropriate classifiers for each superpixel to predict saliency value. This is no doubt to be more effective when input image contains various superpixels with different features, such as Fig. 1.

Considering that the BSDL takes each superpixel as an individual instance without the exploration of the spatial relationship between superpixels. Saliency optimization method is then utilized to further improve the quality of saliency map obtained by BSDL. Previous optimization methods usually assign similar saliency values to adjacent superpixels with similar features, which is hard to enforce saliency consistency between foreground superpixels when salient object consists of multiple regions with different features. Different from previous works, we propose a so called foreground consistency saliency optimization framework (FCSO) to further refine saliency map obtained by BSDL. Two novel optimization matrixes named local structure matrix and spatial compactness matrix are proposed to exploit saliency cues from local structure perspective and global spatial perspective. The proposed FCSO can better enforce saliency consistency between foreground superpixels than previous works. To improve computation efficiency, a prejudgment mechanism is also proposed to evaluate the quality of saliency map obtained by BSDL, which is used to decide whether the FCSO is needed for input image. In summary, the contributions of the proposed method are listed as follows:

  • (1)

    The first contribution is the development of bagging-basedsaliency distribution learning model(BSDL). Given input image, K classifiers are firstly trained to predict each superpixel saliency value by using the bagging-based method. For each superpixel, we also learn to compute its saliency distribution which can infer the reliability of using each classifier to predict its saliency value. Each superpixel’s saliency value is determined by its K prediction saliency values and its saliency distribution. The BSDL deeply analyzes the different characteristics of various superpixels in input image, it is no doubt to be more effective than previous works in complex images.

  • (2)

    The second contribution is to propose a so called foreground consistency saliency optimization framework (FCSO) to further improve the quality of saliency map obtained by BSDL. In the FCSO, new local structure matrix and spatial compactness matrix are developed to update all superpixels’ saliency values.

  • (3)

    The third contribution is the development of an effective prejudgment mechanism, it is used to evaluate the performance of saliency map obtained by BSDL, which help to decide whether the FCSO is needed for input image.

Section snippets

Related work

Deep learning based methods have achieved outstanding performances in recent years. Wang et al. [26] construct two CNN frameworks to exploit saliency cues, they are global search network and local estimation network. He et al. [27] learn a CNN framework named Super-CNN to construct superpixel-level saliency map. Hou et al. [8] propose to introduce short connections into the skip-layer structure within a hierarchical architecture. In [9], multi-scale deep features are learned from CNN to

Bagging-based saliency distribution learning (BSDL)

Given image is segmented into superpixels as basic units in our method. We firstly integrate two prior knowledge to obtain an initial prior map which provides an indicator for subsequent training samples selection. We then utilize the selected training samples to train the BSDL model which contains two stages: (1) Based on bagging-based sampling method, we aim to train K classifiers from training samples, each of which is used to predict each superpixel to be foreground/background (1/0). (2)

Foreground consistency saliency optimization (FCSO)

Considering that the BSDL only takes each superpixel as an individual instance without the exploitation of the spatial relationship between superpixels, therefore, we propose a so called foreground consistency saliency optimization framework (FCSO) to further refine saliency result obtained by BSDL. Previous optimization methods usually assign similar saliency values to adjacent superpixels with similar features, however, they are hard to enforce saliency consistency between foreground

Prejudgment mechanism

In some cases, the FCSO fails to improve the quality of saliency map F obtained by BSDL, i.e., the FCSO even obtains worse saliency map than the BSDL in some images, such as Fig. 6. To address this problem, we construct a prejudgment mechanism to evaluate the performance of saliency map F obtained by BSDL, which is used to determine whether the FCSO is needed.

Generally, there is a great contrast between salient object and background in a good saliency map. i.e., the saliency values of most

Experiments

We compare the proposed method with other 15 state-of-the-art methods including: LEGs [26], S-CNN [27], BSCA [31], TLLT [38], LPS [18], MAP [39], MST [40], KSR [41], LDS [42], SMD [43], MILP [23], DGLS [33], HCA [28], AE [17] and FCB [44]. Where LEGs, S-CNN, KSR, AE and HCA exploit saliency cues by utilizing deep neural network (DNN), MST, LDS, MILPS, DGLS and SMD are saliency detection methods using classical machine learning algorithms or mathematical theories. BSCA, TLLT, LPS and MAP are

Conclusion

In this paper, we propose a novel saliency detection framework via bagging-based saliency distribution learning (BSDL). Input image is segmented into superpixels as basic units in our method. Firstly, we construct an initial prior map to extract roughly saliency cues by integrating two prior knowledge. The initial prior map is used to select superpixels from input image as training samples, which are utilized to train the BSDL model: (1) We use bagging-based sampling method to train K SVM

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant nos. 61701101, 61973093, U1713216, 61901098, 61971118, the Fundamental Research Fund for the Central Universities of China N2026005, N181602014, N2026004, N2026006, N2026001, N2011001, and the project for the science and technology major special plan of Liaoning 2019JH1/10100005.

References (47)

  • ChenC. et al.

    Video saliency detection via spatial–temporal fusion and low-rank coherency diffusion

    IEEE Trans. Image Process.

    (2017)
  • Q. Hou, M. Cheng, X. Hu, A. Borji, Z. Tu, P. Torr, Deeply supervised salient object detection with short connections,...
  • G. Li, Y. Yu, Visual saliency based on multi-scale deep features, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit,...
  • WangL. et al.

    Salient object detection with recurrent fully convolutional networks

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2019)
  • G. Li, Y. Yu, Deep contrast learning for salient object detection, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit,...
  • T. Wang, L. Zhang, S. Wang, H. Lu, G. Yang, X. Ruan, A. Borji, Detect globally, refine locally: a novel approach to...
  • R. Achanta, S. Hemami, F. Estrada, Frequency-tuned salient region detection, in: Proc. IEEE Conf. Comput. Vis. Pattern...
  • ChengM. et al.

    Global contrast based salient region detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • C. Yang, L. Zhang, H. Lu, X. Ruan, M. Yang, Saliency detection via graph-based manifold ranking, in: Proc. IEEE Conf....
  • ZhangL. et al.

    Saliency detection via absorbing markov chain with learnt transition probability

    IEEE Trans. Image Process.

    (2018)
  • LiH. et al.

    Inner and inter label propagation: salient object detection in the wild

    IEEE Trans. Image Process.

    (2015)
  • ChenC. et al.

    Structure-sensitive saliency detection via multilevel rank analysis in intrinsic feature space

    IEEE Trans. Image Process.

    (2015)
  • MaG. et al.

    Salient object detection via multiple instance joint re-learning

    IEEE Trans. Multimedia

    (2020)
  • Cited by (0)

    View full text