Abstract
A key problem in RGB-D saliency prediction is how to effectively exploit the multi-model cues. In the paper, we propose a self-attention generative adversarial network (SAGAN) for RGB-D saliency prediction, which can extract heterogeneous features from RGB and depth by driving long-range dependency modeling and adversarial training. Specifically, we explore selective fusion based on channel attention (SFA) and prior initialization approach to efficiently learn the salient cues of RGB-D images. SFA is designed to adaptively select and fuse features in different level between RGB and depth, and prior initialization is introduced to reduce the demand for annotated RGB-D datasets and accelerate the convergence of model training by reusing RGB prior weights. Extensive experiments on two publicly available datasets demonstrate the superiority of our approach over other state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, Y., Wang, Q., Hu, H., He, Y.: A novel real-time moving target tracking and path planning system for a quadrotor UAV in unknown unstructured outdoor scenes. IEEE Trans. Syst. Man Cybern.: Syst. 49(11), 2362–2372 (2018)
Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2016)
Wang, X., Gao, L., Song, J., Shen, H.: Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process. Lett. 24(4), 510–514 (2016)
Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., Feng, D.D.: Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Trans. Syst. Man Cybern.: Syst. 49(9), 1806–1819 (2018)
Huang, K., Zhu, C., Li, G.: Saliency detection by adaptive channel fusion. IEEE Signal Process. Lett. 25(7), 1059–1063 (2018)
Huang, X., Shen, C., Boix, X., Zhao, Q.: Salicon: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: IEEE International Conference on Computer Vision, pp. 262–270 (2015)
Pan, J., Sayrol, E., Giro-i-Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–606 (2016)
Pan, J., et al.: SalGAN: visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081 (2017)
Kummerer, M., Wallis, T.S., Gatys, L.A., Bethge, M.: Understanding low-and high-level contributions to fixation prediction. In: IEEE International Conference on Computer Vision, pp. 4789–4798 (2017)
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an LSTM-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)
Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Le Callet, P.: How is gaze influenced by image transformations? Dataset and model. IEEE Trans. Image Process. 29, 2287–2300 (2020)
Guan, W., Wang, T., Qi, J., Zhang, L., Lu, H.: Edge-aware convolution neural network based salient object detection. IEEE Signal Process. Lett. 26(1), 114–118 (2018)
Huang, R., Xing, Y., Wang, Z.: RGB-D salient object detection by a CNN with multiple layers fusion. IEEE Signal Process. Lett. 26(4), 552–556 (2019)
Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: Pdnet: prior-model guided depth-enhanced network for salient object detection. In: IEEE International Conference on Multimedia and Expo, pp. 199–204 (2019)
Zhou, W., Lv, Y., Lei, J., Yu, L.: Global and local-contrast guides content-aware fusion for RGB-D saliency prediction. IEEE Trans. Syst. Man Cybern.: Syst. 1–9 (2019)
Lv, Y., Zhou, W., Lei, J., Ye, L., Luo, T.: Attention-based fusion network for human eye-fixation prediction in 3D images. Opt. Express 27(23), 34056–34066 (2019)
Li, G., He, X., Zhang, W., Chang, H., Dong, L., Lin, L.: Non-locally enhanced encoder-decoder network for single image de-raining. In: ACM International Conference on Multimedia, pp. 1056–1064 (2018)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Shi, W.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
Qian, R., Tan, R.T., Yang, W., Su, J., Liu, J.: Attentive generative adversarial network for raindrop removal from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2482–2491 (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)
Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015)
Ma, C.-Y., Hang, H.-M.: Learning-based saliency model with depth information. J. Vis. 15(6), 19 (2015)
Lang, C., Nguyen, T. V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: IEEE European Conference on Computer Vision, pp. 101–115 (2012). https://doi.org/10.1007/978-3-642-33709-3_8
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Y., Xiao, S., Ye, P. (2023). SAGAN: Self-attention Generative Adversarial Network for RGB-D Saliency Prediction. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14356. Springer, Cham. https://doi.org/10.1007/978-3-031-46308-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-46308-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46307-5
Online ISBN: 978-3-031-46308-2
eBook Packages: Computer ScienceComputer Science (R0)