Elsevier

Neurocomputing

Volume 275, 31 January 2018, Pages 2227-2238
Neurocomputing

Stereoscopic saliency model using contrast and depth-guided-background prior

https://doi.org/10.1016/j.neucom.2017.10.052Get rights and content

Abstract

Many successful models of saliency have been proposed to detect salient regions for 2D images. Because stereopsis, with its distinctive depth information, influences human viewing, it is necessary for stereoscopic saliency detection to consider depth information as an additional cue. In this paper, we propose a 3D stereoscopic saliency model based on both contrast and depth-guided-background prior. First, a depth-guided-background prior is specifically detected from a disparity map apart from the conventional prior, assuming boundary super-pixels as background. Then, saliency based on disparity with the help of the proposed prior is proposed to prioritize the contrasts among super-pixels. In addition, a scheme to combine the contrast of disparity and the contrast of color is presented. Finally, 2D spatial dissimilarity features are further employed to refine the saliency map. Experimental results on the PSU stereo saliency benchmark dataset (SSB) show that the proposed method performs better than existing saliency models.

Introduction

The human visual system (HVS) is miraculous in determining the important aspects present in the visual when the eyes acquire information. To simulate the HVS mechanism, two distinct saliency models are applied: bottom-up (stimulus-driven) and top-down (task-driven) [1], [2], [3], [4]. Bottom-up models extract the center-surrounding difference (local or global) features with low-level information in different channels and then combine these feature maps to build a saliency map [5], [6], [7]. Though many image and video saliency models are used in vision tasks [8], [9], [10], [11], [12], with the development of 3D technologies, stereoscopic visual saliency models have attracted increasing attention for diverse applications such as synthetic vision [13], retargeting [14], rendering [15], quality assessment [16], [17], visual discomfort [18], [19], stereoscopic thumbnail creation [20] and disparity control [21]. However, thus far, research [22], [23], [24] that focuses on stereoscopic saliency models is lacking when compared with the rapid expansion of saliency models for 2D images. Overall, much work remains to be done in 3D stereoscopic saliency model research before it can approach the capabilities of the human visual system.

Because the sensation of impression [25] is enhanced by the binocular parallax generated by a stereo-channel-separated display, binocular depth cues introduced for 3D displays have changed human viewing behavior [26], [27]. Therefore, this additional depth cue should be considered in a stereoscopic saliency model. Although current bottom-up stereoscopic saliency models are effective, questions remain that must be addressed:

  • * How can a salient object be detected in a scene where the colors are similar? Sometimes, the colors of both salient and non-salient objects are similar in natural images; therefore, methods based on color contrast tend to obtain low values inside salient objects. In this situation, disparity cues could help to complete the whole salient object (e.g., the first row in Fig. 1(e)–(g)). Color and disparity are both useful in computing saliency and are, to some extent, relevant to each other. Because the interaction of disparity and 2D information is ignored, we attempt to introduce compactness [28] from the color map to measure the relationship while calculating contrast (e.g., the first row in Fig. 1(d)).

  • * How can a salient object be detected in a scene with a cluttered background? Current bottom-up saliency methods have difficulties in coping with images where the non-salient part of the image is cluttered (e.g., the second row of Fig. 1(a)). Previous models sometimes render non-salient parts as salient because of their high contrast to the surroundings (e.g., the second row in Fig. 1(e) and (f)). This phenomenon may be avoided by using a disparity map (e.g., Fig. 1(g)). Although boundary background prior [29], [30] has been verified as effective, we find that a prior could be exploited by a disparity map based on two observations: 1) the non-salient areas distant from viewers tend to form a smooth surface on a disparity map; 2) there is an obvious discontinuity between the salient and non-salient parts. Hence, taking multiple cues into consideration, including compactness and background priors, can be useful in forming a complete stereoscopic saliency analysis (e.g., the second row in Fig. 1(d)).

  • * How can a salient object be detected according to the disparity cue in a cluttered scene? For a simple scene, a good saliency map shown in the first row in Fig. 1(g), could be obtained by a disparity map shown in the first row in Fig. 1(c)). However, the saliency detection method cannot cope with one image very well, especially in a cluttered scene shown in the second row in Fig. 1(g). Some regions in background are defined as high saliency values due to being close to salient ones in depth. Then, color feature (compactness [28]), and background priors may be considered together, in this way, non-salient background could be suppressed to a certain degree (e.g., Fig. 1(d)).

In this paper, we propose a united stereoscopic saliency model, called Saliency with Contrast and Depth-Guided-Background (SCDGB), which combines the saliency obtained from a disparity map with low-level contrast. Saliency based on disparity takes advantage of compactness and background priors to not only highlight the salient object but also eliminate non-salient objects. As for background priors, the depth-guided-background prior is specially explored on a disparity map in addition to the conventional boundary background prior [29]. The contrasts are composed of color contrast and disparity contrast measured by compactness. Moreover, the 2D saliency map and the center-bias preference of human vision are also employed to refine the final saliency map. The results of experiments show that the proposed stereoscopic saliency model achieves superior performance compared with existing saliency approaches. The contributions of this paper are as follows:

  • 1.

    We propose a stereoscopic saliency model that unites contrast and saliency based on disparity.

  • 2.

    We develop a saliency based on disparity using the proposed depth-guided-background prior.

  • 3.

    We present a strategy to represent the contrast of stereoscopic images by fusing the multichannel contrasts.

The rest of this paper is organized as follows. A review of the related saliency models is given in Section 2. The proposed stereoscopic saliency detection model is elaborated in Section 3. An experiment and an evaluation of the proposed model are presented in Section 4. Conclusions are provided in Section 5.

Section snippets

Related work

During the past few decades, much work has been performed to create saliency models for images. Because our model is based on low-level contrast, we first provide a review of 2D saliency models based on color contrast. Then, we present an analysis of co-saliency and video saliency, 3D saliency models with disparity and high-level priors.

Proposed model description

Our study is related to low-level contrast features, including color and disparity. In contrast to previous methods, we propose a scheme to fuse color and disparity contrasts instead of handling color and disparity maps separately. Additionally, we present a saliency based on disparity that mimics the manner in which humans view images at differing depth levels to weight the contrast feature. Given a color image and a corresponding disparity map, we provide an overall expression to define

Experimental results

To evaluate the proposed model, we performed experiments using the SSB dataset provided by Niu et al. [32]. This publicly available dataset consists of 1000 pairs of stereoscopic images along with the corresponding masks of salient objects in the left images. Niu follows the procedure designed by Liu et al. [60] to build the benchmark dataset. The most salient object in an image is marked with a rectangle by three users. The images with the least consistent labels are removed. Next, a mask of

Conclusions

In this paper, we propose a saliency detection model for stereoscopic 3D images. We combine two background priors and compactness to develop saliency based on disparity. The background priors include a depth-guided-background prior explored through a disparity map and a boundary background. A saliency based on disparity approach is presented to give a priority weight to each region’s contrast. To measure the contrast of each region, we build a scheme to fuse the respective contrasts of

Acknowledgments

This research is partially sponsored by National Natural Science Foundation of China [Grant numbers, 61370113, 61472387, 61572004 and 61771026], the Beijing Municipal Natural Science Foundation [Grant numbers 4152005 and 4152006], the Science and Technology Program of Tianjin [Grant number 15YFXQGX0050], and the Science and Technology Planning Project of Qinghai Province [Grant number 2016-ZJ-Y04].

Fangfang Liang received the M.S. degree in Computer Science from Three Gorges University, Yichang, China, in 2010. She is currently pursuing the Ph.D. degree at the Faculty of Information Technology, Beijing University of Technology, China. Her current research interests include Image Processing, Machine Learning and Computer Vision.

References (63)

  • LiJ. et al.

    Probabilistic multi-task learning for visual saliency estimation in video

    Int. J. Comput. Vis.

    (2010)
  • A. Borji

    Boosting bottom-up and top-down visual features for saliency estimation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012

    (2012)
  • HouX. et al.

    Saliency detection: a spectral residual approach

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2007)
  • YanQ. et al.

    Hierarchical saliency detection

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2013)
  • ShenJ. et al.

    Lazy random walks for superpixel segmentation

    IEEE Trans. Imag. Process.

    (2014)
  • WangW. et al.

    Saliency-aware geodesic video object segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • WangW. et al.

    Consistent video saliency using local gradient flow optimization and global refinement

    IEEE Trans. Imag. Process.

    (2015)
  • ZhangD. et al.

    Detection of co-salient objects by looking deep and wide

    Int. J. Comput. Vis.

    (2016)
  • ZhangD. et al.

    Revealing event saliency in unconstrained video collection

    IEEE Trans. Imag. Process.

    (2017)
  • N. Courty et al.

    A new application for saliency maps: synthetic vision of autonomous actors

    Proceedings of International Conference on Image Processing, ICIP 2003

    (2003)
  • YooJ.W. et al.

    Content-driven retargeting of stereoscopic images

    IEEE Signal Process. Lett.

    (2013)
  • C. Chamaret et al.

    Adaptive 3d rendering based on region-of-interest

    Proceedings of IS&T/SPIE Electronic Imaging

    (2010)
  • ShaoF. et al.

    Perceptual full-reference quality assessment of stereoscopic images by considering binocular visual characteristics

    IEEE Trans. Image Process.

    (2013)
  • Huynh-ThuQ. et al.

    The importance of visual attention in improving the 3d-tv viewing experience: overview and new perspectives

    IEEE Trans. Broadcast.

    (2011)
  • SohnH. et al.

    Attention model-based visual comfort assessment for stereoscopic depth perception

    Proceedings of 17th International Conference on Digital Signal Processing, DSP

    (2011)
  • WangW. et al.

    Stereoscopic thumbnail creation via efficient stereo saliency detection

    IEEE Trans. Vis. Comput. Graph.

    (2017)
  • LeiJ. et al.

    Stereoscopic visual attention guided disparity control for multiview images

    J. Disp. Technol.

    (2014)
  • FangY. et al.

    Saliency detection for stereoscopic images

    IEEE Trans. Image Process.

    (2014)
  • LiN. et al.

    A weighted sparse coding framework for saliency detection

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • GengJ.

    Three-dimensional display technologies

    Adv. Opt. Photon.

    (2013)
  • L. Jansen et al.

    Influence of disparity on fixation and saccades in free viewing of natural scenes

    J. Vis.

    (2009)
  • Cited by (68)

    • Multimodal salient object detection via adversarial learning with collaborative generator

      2023, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      RGBD SOD introduces depth information as an auxiliary modality to detect the common salient objects. Traditional algorithms (Zhu et al., 2017; Liang et al., 2018; Wang and Wang, 2017) extract hand-crafted features to compute saliency confident scores. But the robustness and the generalization performance of these methods are very weak.

    • Depth guided feature selection for RGBD salient object detection

      2023, Neurocomputing
      Citation Excerpt :

      As the acquisition of depth clues technology matures, more and more works have been proposed to combine depth and RGB cues for saliency detection over RGBD images. Base on various machine learning algorithms, performance of early RGBD salient object detection methods [39,13,40–49] usually rely on the hand-crafted features. For instance, Lang et al. [39] utilized Gaussian mixture models to model the distribution of depth induced saliency, which is the first work for RGBD saliency detection.

    • AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection

      2021, Neurocomputing
      Citation Excerpt :

      Niu et al. [35] introduced the disparity contrast and domain knowledge into stereoscopic photography for measuring the stereo saliency. Several other SOD studies relying on hand-crafted features were also extended for RGB-D SOD, e.g., based on contrast [8,11,36], boundary prior [9,29,50], or compactness [10]. Since the above methods heavily rely on hand-crafted heuristic features, they often have limited generalizability to more complex scenarios.

    View all citing articles on Scopus

    Fangfang Liang received the M.S. degree in Computer Science from Three Gorges University, Yichang, China, in 2010. She is currently pursuing the Ph.D. degree at the Faculty of Information Technology, Beijing University of Technology, China. Her current research interests include Image Processing, Machine Learning and Computer Vision.

    Lijuan Duan received the B.Sc. and M.Sc. degrees in computer science from Zhengzhou University of Technology, Zhengzhou, China, in 1995 and 1998, respectively. She received the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2003. She is currently a Professor at the Faculty of Information Technology, Beijing University of Technology, China. Her research interests include Artificial Intelligence, Image Processing, Computer Vision and Information Security. She has published more than 70 research articles in refereed journals and proceedings on artificial intelligent, image processing and computer vision.

    Wei Ma received her Ph.D. degree in Computer Science from Peking University, in 2009. She is currently an Associate Professor at the Faculty of Information Technology, Beijing University of Technology, China. Her research interests include Image Processing, Computer Vision and their applications in the protection and exhibition of Chinese ancient paintings.

    Yuanhua Qiao received the B.S. degree in Department of Mathematics from Qilu Normal University, Jinan, Shan Dong, in 1992, the M.S. degree in Applied Science from Beijing University of Technology, Beijing, in 1999, and Ph.D. degree in fluid mechanics from College of life science and Biotechnology, Beijing, in 2005. From 1999 up till now, she was a Research Assistant, an Associate Professor and Professor, respectively, in Applied Science of Beijing University of Technology, China. Her research interests include dynamic analysis of neuron networks, synchronization analysis of neuron network, differential equation and dynamic system. Professor Qiao is a membership of Mathematics Education, in 2006, she got Project completion certificate (Ministry of Education).

    Zhi Cai is a lecturer in the College of Computer Science, Beijing University of Technology, China. He obtained his M.Sc. in 2007 from the School of Computer Science in the University of Manchester and his Ph.D. in 2011 from the Department of Computing and Mathematics of the Manchester Metropolitan University, U.K. His research interests include Information Retrieval, Ranking in Relational Databases, Keyword Search, Data Mining, Big Data Management & Analysis and Ontology Engineering.

    Laiyun Qing is with the School of Information Science and Engineering, Graduate University of the Chinese Academy of Sciences, China. She received her Ph.D. in computer science from Chinese Academy of Sciences in 2005. Her research interests include pattern recognition, image processing and statistical learning. Her current research focuses on neural information processing.

    View full text