Stereoscopic saliency model using contrast and depth-guided-background prior

doi:10.1016/j.neucom.2017.10.052

Neurocomputing

Volume 275, 31 January 2018, Pages 2227-2238

https://doi.org/10.1016/j.neucom.2017.10.052 Get rights and content

Abstract

Many successful models of saliency have been proposed to detect salient regions for 2D images. Because stereopsis, with its distinctive depth information, influences human viewing, it is necessary for stereoscopic saliency detection to consider depth information as an additional cue. In this paper, we propose a 3D stereoscopic saliency model based on both contrast and depth-guided-background prior. First, a depth-guided-background prior is specifically detected from a disparity map apart from the conventional prior, assuming boundary super-pixels as background. Then, saliency based on disparity with the help of the proposed prior is proposed to prioritize the contrasts among super-pixels. In addition, a scheme to combine the contrast of disparity and the contrast of color is presented. Finally, 2D spatial dissimilarity features are further employed to refine the saliency map. Experimental results on the PSU stereo saliency benchmark dataset (SSB) show that the proposed method performs better than existing saliency models.

Introduction

The human visual system (HVS) is miraculous in determining the important aspects present in the visual when the eyes acquire information. To simulate the HVS mechanism, two distinct saliency models are applied: bottom-up (stimulus-driven) and top-down (task-driven) [1], [2], [3], [4]. Bottom-up models extract the center-surrounding difference (local or global) features with low-level information in different channels and then combine these feature maps to build a saliency map [5], [6], [7]. Though many image and video saliency models are used in vision tasks [8], [9], [10], [11], [12], with the development of 3D technologies, stereoscopic visual saliency models have attracted increasing attention for diverse applications such as synthetic vision [13], retargeting [14], rendering [15], quality assessment [16], [17], visual discomfort [18], [19], stereoscopic thumbnail creation [20] and disparity control [21]. However, thus far, research [22], [23], [24] that focuses on stereoscopic saliency models is lacking when compared with the rapid expansion of saliency models for 2D images. Overall, much work remains to be done in 3D stereoscopic saliency model research before it can approach the capabilities of the human visual system.

Because the sensation of impression [25] is enhanced by the binocular parallax generated by a stereo-channel-separated display, binocular depth cues introduced for 3D displays have changed human viewing behavior [26], [27]. Therefore, this additional depth cue should be considered in a stereoscopic saliency model. Although current bottom-up stereoscopic saliency models are effective, questions remain that must be addressed:

* How can a salient object be detected in a scene where the colors are similar? Sometimes, the colors of both salient and non-salient objects are similar in natural images; therefore, methods based on color contrast tend to obtain low values inside salient objects. In this situation, disparity cues could help to complete the whole salient object (e.g., the first row in Fig. 1(e)–(g)). Color and disparity are both useful in computing saliency and are, to some extent, relevant to each other. Because the interaction of disparity and 2D information is ignored, we attempt to introduce compactness [28] from the color map to measure the relationship while calculating contrast (e.g., the first row in Fig. 1(d)).
* How can a salient object be detected in a scene with a cluttered background? Current bottom-up saliency methods have difficulties in coping with images where the non-salient part of the image is cluttered (e.g., the second row of Fig. 1(a)). Previous models sometimes render non-salient parts as salient because of their high contrast to the surroundings (e.g., the second row in Fig. 1(e) and (f)). This phenomenon may be avoided by using a disparity map (e.g., Fig. 1(g)). Although boundary background prior [29], [30] has been verified as effective, we find that a prior could be exploited by a disparity map based on two observations: 1) the non-salient areas distant from viewers tend to form a smooth surface on a disparity map; 2) there is an obvious discontinuity between the salient and non-salient parts. Hence, taking multiple cues into consideration, including compactness and background priors, can be useful in forming a complete stereoscopic saliency analysis (e.g., the second row in Fig. 1(d)).
* How can a salient object be detected according to the disparity cue in a cluttered scene? For a simple scene, a good saliency map shown in the first row in Fig. 1(g), could be obtained by a disparity map shown in the first row in Fig. 1(c)). However, the saliency detection method cannot cope with one image very well, especially in a cluttered scene shown in the second row in Fig. 1(g). Some regions in background are defined as high saliency values due to being close to salient ones in depth. Then, color feature (compactness [28]), and background priors may be considered together, in this way, non-salient background could be suppressed to a certain degree (e.g., Fig. 1(d)).

In this paper, we propose a united stereoscopic saliency model, called Saliency with Contrast and Depth-Guided-Background (SCDGB), which combines the saliency obtained from a disparity map with low-level contrast. Saliency based on disparity takes advantage of compactness and background priors to not only highlight the salient object but also eliminate non-salient objects. As for background priors, the depth-guided-background prior is specially explored on a disparity map in addition to the conventional boundary background prior [29]. The contrasts are composed of color contrast and disparity contrast measured by compactness. Moreover, the 2D saliency map and the center-bias preference of human vision are also employed to refine the final saliency map. The results of experiments show that the proposed stereoscopic saliency model achieves superior performance compared with existing saliency approaches. The contributions of this paper are as follows:

1.
We propose a stereoscopic saliency model that unites contrast and saliency based on disparity.
2.
We develop a saliency based on disparity using the proposed depth-guided-background prior.
3.
We present a strategy to represent the contrast of stereoscopic images by fusing the multichannel contrasts.

The rest of this paper is organized as follows. A review of the related saliency models is given in Section 2. The proposed stereoscopic saliency detection model is elaborated in Section 3. An experiment and an evaluation of the proposed model are presented in Section 4. Conclusions are provided in Section 5.

Section snippets

Related work

During the past few decades, much work has been performed to create saliency models for images. Because our model is based on low-level contrast, we first provide a review of 2D saliency models based on color contrast. Then, we present an analysis of co-saliency and video saliency, 3D saliency models with disparity and high-level priors.

Proposed model description

Our study is related to low-level contrast features, including color and disparity. In contrast to previous methods, we propose a scheme to fuse color and disparity contrasts instead of handling color and disparity maps separately. Additionally, we present a saliency based on disparity that mimics the manner in which humans view images at differing depth levels to weight the contrast feature. Given a color image and a corresponding disparity map, we provide an overall expression to define

Experimental results

To evaluate the proposed model, we performed experiments using the SSB dataset provided by Niu et al. [32]. This publicly available dataset consists of 1000 pairs of stereoscopic images along with the corresponding masks of salient objects in the left images. Niu follows the procedure designed by Liu et al. [60] to build the benchmark dataset. The most salient object in an image is marked with a rectangle by three users. The images with the least consistent labels are removed. Next, a mask of

Conclusions

In this paper, we propose a saliency detection model for stereoscopic 3D images. We combine two background priors and compactness to develop saliency based on disparity. The background priors include a depth-guided-background prior explored through a disparity map and a boundary background. A saliency based on disparity approach is presented to give a priority weight to each region’s contrast. To measure the contrast of each region, we build a scheme to fuse the respective contrasts of

Acknowledgments

This research is partially sponsored by National Natural Science Foundation of China [Grant numbers, 61370113, 61472387, 61572004 and 61771026], the Beijing Municipal Natural Science Foundation [Grant numbers 4152005 and 4152006], the Science and Technology Program of Tianjin [Grant number 15YFXQGX0050], and the Science and Technology Planning Project of Qinghai Province [Grant number 2016-ZJ-Y04].

Fangfang Liang received the M.S. degree in Computer Science from Three Gorges University, Yichang, China, in 2010. She is currently pursuing the Ph.D. degree at the Faculty of Information Technology, Beijing University of Technology, China. Her current research interests include Image Processing, Machine Learning and Computer Vision.

References (63)

C.E. Connor et al.
Visual attention: bottom-up versus top-down
Curr. Biol.
(2004)
TheeuwesJ.
Top-down and bottom-up control of visual selection
Acta Psychol.
(2010)
L. Itti et al.
Models of bottom-up attention and saliency
Neurobiol. Atten.
(2005)
YangJ. et al.
Quality assessment metric of stereo images considering cyclopean integration and visual saliency
Inf. Sci.
(2016)
A. Maki et al.
Attentional scene segmentation: integrating depth and motion
Comput. Vis. Image Underst.
(2000)
ZhangY. et al.
Saliency detection by selective color features
Neurocomputing
(2016)
DuanL. et al.
A spatiotemporal weighted dissimilarity-based method for video saliency detection
Signal Process. Image Commun.
(2015)
LiuY. et al.
Dichotomy between luminance and disparity features at binocular fixations
J. Vis.
(2010)
XiaC. et al.
Combining multi-layer integration algorithm with background prior and label propagation for saliency detection
J. Vis. Commun. Image Represent.
(2017)
QianN.
Binocular disparity and the perception of depth
Neuron
(1997)

LiJ. et al.

Probabilistic multi-task learning for visual saliency estimation in video

Int. J. Comput. Vis.

(2010)

A. Borji

Boosting bottom-up and top-down visual features for saliency estimation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012

(2012)

HouX. et al.

Saliency detection: a spectral residual approach

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2007)

YanQ. et al.

Hierarchical saliency detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2013)

ShenJ. et al.

Lazy random walks for superpixel segmentation

IEEE Trans. Imag. Process.

(2014)

WangW. et al.

Saliency-aware geodesic video object segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2015)

WangW. et al.

Consistent video saliency using local gradient flow optimization and global refinement

IEEE Trans. Imag. Process.

(2015)

ZhangD. et al.

Detection of co-salient objects by looking deep and wide

Int. J. Comput. Vis.

(2016)

ZhangD. et al.

Revealing event saliency in unconstrained video collection

IEEE Trans. Imag. Process.

(2017)

N. Courty et al.

A new application for saliency maps: synthetic vision of autonomous actors

Proceedings of International Conference on Image Processing, ICIP 2003

(2003)

YooJ.W. et al.

Content-driven retargeting of stereoscopic images

IEEE Signal Process. Lett.

(2013)

C. Chamaret et al.

Adaptive 3d rendering based on region-of-interest

Proceedings of IS&T/SPIE Electronic Imaging

(2010)

ShaoF. et al.

Perceptual full-reference quality assessment of stereoscopic images by considering binocular visual characteristics

IEEE Trans. Image Process.

(2013)

Huynh-ThuQ. et al.

The importance of visual attention in improving the 3d-tv viewing experience: overview and new perspectives

IEEE Trans. Broadcast.

(2011)

SohnH. et al.

Attention model-based visual comfort assessment for stereoscopic depth perception

Proceedings of 17th International Conference on Digital Signal Processing, DSP

(2011)

WangW. et al.

Stereoscopic thumbnail creation via efficient stereo saliency detection

IEEE Trans. Vis. Comput. Graph.

(2017)

LeiJ. et al.

Stereoscopic visual attention guided disparity control for multiview images

J. Disp. Technol.

(2014)

FangY. et al.

Saliency detection for stereoscopic images

IEEE Trans. Image Process.

(2014)

LiN. et al.

A weighted sparse coding framework for saliency detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2015)

GengJ.

Three-dimensional display technologies

Adv. Opt. Photon.

(2013)

L. Jansen et al.

Influence of disparity on fixation and saccades in free viewing of natural scenes

J. Vis.

(2009)

Cited by (68)

Salient object detection via multi-grained refinement polygon topology positive feedback
2024, Expert Systems with Applications
Despite significant progress in current salient object detection (SOD) tasks, they are limited by overlapping regions, broken connections, voids and neglecting explicit exploration of background regions. In this paper, we propose a multi-grained refinement polygonal topology network (MPT) that explores two important components of background mining information and the impact of topological segmentation errors. (1) Unlike existing models that focus on inter-image relationships, we introduce a multi-grained refinement module to discriminate between saliency objects and background information, and introduce inter-image relationships into the pixel-by-pixel segmentation features to enhance the discrimination of segmentation features, and this Multi-grained refinement is the basis for mining saliency objects. (2) Based on the multi-grained refinement features, we introduce a polygonal topology-aware module and add topological constraints as penalty terms to regularize the focus loss to effectively resolve topological errors to enhance the quality of the performance of salient object detection. Meanwhile, we introduce a measurements-based on positive feedback model to integrate the decision fusion of multi-model perception results to obtain higher accuracy of saliency prediction while ensuring that the prediction results will not deteriorate. Extensive experiments on six benchmark and extended datasets demonstrate that our proposed MPTNet exhibits excellent performance both qualitatively and quantitatively and outperforms state-of-the-art saliency detectors.
Multimodal salient object detection via adversarial learning with collaborative generator
2023, Engineering Applications of Artificial Intelligence
Citation Excerpt :
RGBD SOD introduces depth information as an auxiliary modality to detect the common salient objects. Traditional algorithms (Zhu et al., 2017; Liang et al., 2018; Wang and Wang, 2017) extract hand-crafted features to compute saliency confident scores. But the robustness and the generalization performance of these methods are very weak.
Multimodal salient object detection(MSOD), which utilizes multimodal information (e.g., RGB image and thermal infrared or depth image) to detect common salient objects, has received much attention recently. Different modalities reflect different appearance properties of salient objects, some of which could contribute to improving the precision and/or recall of MSOD. To greatly improve both Precision and Recall by fully exploring multimodal data, in this work, we propose an effective adversarial learning framework based on a novel collaborative generator for accurate multimodal salient object detection. In particular, the collaborative generator consists of three generators (generator1, generator2 and generator3), which aim at decreasing the false positive and false negative of the generated saliency maps and improving F-measure of the final saliency maps respectively. Generator1 and generator2 contain two encoder–decoder networks for multimodal inputs, and we propose a new co-attention model to perform adaptive interactions between different modalities. Furthermore, we apply generator3 to integrate feature maps from generator1 and generator2 in a complementary way. Through adversarially learning the collaborative generator and discriminator, both Precision and Recall of the predicted maps are boosted with the complementary benefits of multimodal data. Extensive experiments on three RGBT datasets and six RGBD datasets show that our method performs quite well against state-of-the-art MSOD methods.
Depth guided feature selection for RGBD salient object detection
2023, Neurocomputing
Citation Excerpt :
As the acquisition of depth clues technology matures, more and more works have been proposed to combine depth and RGB cues for saliency detection over RGBD images. Base on various machine learning algorithms, performance of early RGBD salient object detection methods [39,13,40–49] usually rely on the hand-crafted features. For instance, Lang et al. [39] utilized Gaussian mixture models to model the distribution of depth induced saliency, which is the first work for RGBD saliency detection.
Depth information can greatly benefit the saliency detection in RGBD images if they are utilized well. Prevalent methods generally directly fuse depth and RGB features in networks. However, due to the inherent inconsistent between RGB and depth information, the RGB features are easy to be interfered by the intrinsic noise existed in depth features, making the precise RGBD saliency detection still a challenge. In this paper, we propose a novel Depth Guided Feature Selection network (DGFSnet) that takes depth information as prior and dynamically selects the complementary RGB information for RGBD salient object detection. Specifically, DGFSnet first includes a Depth Weight Generation module (DWG) to learn a set of layer-specific weights from multi-scale depth features. Guiding by these learned weights, DGFSnet further devises a Weight-guided Feature Aggregation module (WFA) to assign them to their corresponding RGB layers for dynamically enhancing and selecting saliency-related RGB features. With two modules, DGFSnet is able to effectively integrate the multi-modality complementaries and further highlight salient regions. Experimental results over seven popular RGBD salient object detection benchmarks demonstrate that DGFSnet fairly locates salient regions and effectively segments the complete object.
Few-shot learning-based RGB-D salient object detection: A case study
2022, Neurocomputing
RGB-D salient object detection (SOD) aims at detecting general attention-grabbing objects from paired RGB and depth image inputs, and recently has attracted increasing research attention. Despite that many advanced RGB-D SOD models are proposed, almost all of them focus on developing models in a fully supervised manner with a small training dataset that typically has only hundreds of RGB-D samples. This may inevitably incur poor generalizability of these models when being applied to real-world scenarios and applications. To narrow such a gap, we make the first attempt of treating RGB-D SOD as a few-shot learning (FSL) problem, and improve it by introducing extra prior knowledge from a closely related task, i.e., RGB SOD. Inspired by the general taxonomy of FSL techniques, we investigate from two perspectives, namely model and data, of transferring additional knowledge from the RGB SOD dataset to enhance RGB-D SOD performance. For the former, we employ multi-task learning with parameter sharing to constrain the model space, whereas for the latter, we propose to generate the depth from RGB by using an off-the-shelf depth estimator. Representative middle-fusion and late-fusion models are trialed and validated under such a FSL setup. Our experimental results and analyses confirm the feasibility of promoting RGB-D SOD via FSL techniques, while comparative study on different FSL techniques and detection strategies is conducted. We hope this work can serve as a catalyst for bringing RGB-D saliency detection into real applications, as well as for inspiring future works that apply few-shot learning to saliency detection and other multi-modal detection tasks.
AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection
2021, Neurocomputing
Citation Excerpt :
Niu et al. [35] introduced the disparity contrast and domain knowledge into stereoscopic photography for measuring the stereo saliency. Several other SOD studies relying on hand-crafted features were also extended for RGB-D SOD, e.g., based on contrast [8,11,36], boundary prior [9,29,50], or compactness [10]. Since the above methods heavily rely on hand-crafted heuristic features, they often have limited generalizability to more complex scenarios.
Effective exploration of useful contextual information in multi-modal images is an essential task in salient object detection. Nevertheless, the existing methods based on the early-fusion or the late-fusion schemes cannot address this problem as they are unable to effectively resolve the distribution gap and information loss. In this paper, we propose an adaptive multi-level deformable fusion network (AMDFNet) to exploit the cross-modality information. We use a cross-modality deformable convolution module to dynamically adjust the boundaries of salient objects by exploring the extra input from another modality. This enables incorporating the existing features and propagating more contexts so as to strengthen the model’s ability to perceiving scenes. To accurately refine the predicted maps, a multi-scaled feature refinement module is proposed to enhance the intermediate features with multi-level prediction in the decoder part. Furthermore, we introduce a selective cross-modality attention module in the fusion process to exploit the attention mechanism. This module captures dense long-range cross-modality dependencies from a multi-modal hierarchical feature’s perspective. This strategy enables the network to select more informative details and suppress the contamination caused by the negative depth maps. Experimental results on eight benchmark datasets demonstrate the effectiveness of the components in our proposed model, as well as the overall saliency model.
A cross-modal edge-guided salient object detection for RGB-D image
2021, Neurocomputing
Salient object detection simulates the attention mechanism of human behavior to grasp the most attractive objects in the images. Recently edge information has been introduced to enhance the sharp contour in RGB image saliency detection. Inspired by it, we probe into the edge-guided RGB-D image saliency detection. There are two key problems need to be solved. One is how to extract edge information from cross-modal color and depth information, the other is how to fuse the edge feature into double-stream saliency detection network. To solve these two issues, a cross-modal edge-guided salient object detection for RGB-D image is proposed. Based on double-stream U-Net framework, edge information is extracted from the deep and shallow block of both modalities. The feature in deep layer contains sematic information implying where are the object boundaries, so the features of both modalities are directly fused. The feature in shallow layer provides more detailed spatial information, so a gated fusion layer is utilized to fuse the features of both modalities to filter out the depth image noise. Extracted edge feature is fed into decoder combining with color and depth feature to achieve edge-guided cross-modal decoding process. Experimental results show our model outperforms SOTA models based on the edge guidance and gated fusion strategies in cross-modal double-stream network.

View all citing articles on Scopus

Lijuan Duan received the B.Sc. and M.Sc. degrees in computer science from Zhengzhou University of Technology, Zhengzhou, China, in 1995 and 1998, respectively. She received the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2003. She is currently a Professor at the Faculty of Information Technology, Beijing University of Technology, China. Her research interests include Artificial Intelligence, Image Processing, Computer Vision and Information Security. She has published more than 70 research articles in refereed journals and proceedings on artificial intelligent, image processing and computer vision.

Wei Ma received her Ph.D. degree in Computer Science from Peking University, in 2009. She is currently an Associate Professor at the Faculty of Information Technology, Beijing University of Technology, China. Her research interests include Image Processing, Computer Vision and their applications in the protection and exhibition of Chinese ancient paintings.

Yuanhua Qiao received the B.S. degree in Department of Mathematics from Qilu Normal University, Jinan, Shan Dong, in 1992, the M.S. degree in Applied Science from Beijing University of Technology, Beijing, in 1999, and Ph.D. degree in fluid mechanics from College of life science and Biotechnology, Beijing, in 2005. From 1999 up till now, she was a Research Assistant, an Associate Professor and Professor, respectively, in Applied Science of Beijing University of Technology, China. Her research interests include dynamic analysis of neuron networks, synchronization analysis of neuron network, differential equation and dynamic system. Professor Qiao is a membership of Mathematics Education, in 2006, she got Project completion certificate (Ministry of Education).

Zhi Cai is a lecturer in the College of Computer Science, Beijing University of Technology, China. He obtained his M.Sc. in 2007 from the School of Computer Science in the University of Manchester and his Ph.D. in 2011 from the Department of Computing and Mathematics of the Manchester Metropolitan University, U.K. His research interests include Information Retrieval, Ranking in Relational Databases, Keyword Search, Data Mining, Big Data Management & Analysis and Ontology Engineering.

Laiyun Qing is with the School of Information Science and Engineering, Graduate University of the Chinese Academy of Sciences, China. She received her Ph.D. in computer science from Chinese Academy of Sciences in 2005. Her research interests include pattern recognition, image processing and statistical learning. Her current research focuses on neural information processing.

View full text

Stereoscopic saliency model using contrast and depth-guided-background prior

Abstract

Introduction

Section snippets

Related work

Proposed model description

Experimental results

Conclusions

Acknowledgments

Curr. Biol.

Acta Psychol.

Neurobiol. Atten.

Inf. Sci.

Comput. Vis. Image Underst.

Neurocomputing

Signal Process. Image Commun.

J. Vis.

J. Vis. Commun. Image Represent.

Neuron

Probabilistic multi-task learning for visual saliency estimation in video

Int. J. Comput. Vis.

Boosting bottom-up and top-down visual features for saliency estimation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012

Saliency detection: a spectral residual approach

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Hierarchical saliency detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Lazy random walks for superpixel segmentation

IEEE Trans. Imag. Process.

Saliency-aware geodesic video object segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Consistent video saliency using local gradient flow optimization and global refinement

IEEE Trans. Imag. Process.

Detection of co-salient objects by looking deep and wide

Int. J. Comput. Vis.

Revealing event saliency in unconstrained video collection

IEEE Trans. Imag. Process.

A new application for saliency maps: synthetic vision of autonomous actors

Proceedings of International Conference on Image Processing, ICIP 2003

Content-driven retargeting of stereoscopic images

IEEE Signal Process. Lett.

Adaptive 3d rendering based on region-of-interest

Proceedings of IS&T/SPIE Electronic Imaging

Perceptual full-reference quality assessment of stereoscopic images by considering binocular visual characteristics

IEEE Trans. Image Process.

The importance of visual attention in improving the 3d-tv viewing experience: overview and new perspectives

IEEE Trans. Broadcast.

Attention model-based visual comfort assessment for stereoscopic depth perception

Proceedings of 17th International Conference on Digital Signal Processing, DSP

Stereoscopic thumbnail creation via efficient stereo saliency detection

IEEE Trans. Vis. Comput. Graph.

Stereoscopic visual attention guided disparity control for multiview images

J. Disp. Technol.

Saliency detection for stereoscopic images

IEEE Trans. Image Process.

A weighted sparse coding framework for saliency detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Three-dimensional display technologies

Adv. Opt. Photon.

Influence of disparity on fixation and saccades in free viewing of natural scenes

J. Vis.