SAGAN: Self-attention Generative Adversarial Network for RGB-D Saliency Prediction

Wang, Yongfang; Xiao, Shuyu; Ye, Peng

doi:10.1007/978-3-031-46308-2_10

Yongfang Wang¹⁴,
Shuyu Xiao¹⁴ &
Peng Ye¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14356))

Included in the following conference series:

International Conference on Image and Graphics

288 Accesses

Abstract

A key problem in RGB-D saliency prediction is how to effectively exploit the multi-model cues. In the paper, we propose a self-attention generative adversarial network (SAGAN) for RGB-D saliency prediction, which can extract heterogeneous features from RGB and depth by driving long-range dependency modeling and adversarial training. Specifically, we explore selective fusion based on channel attention (SFA) and prior initialization approach to efficiently learn the salient cues of RGB-D images. SFA is designed to adaptively select and fuse features in different level between RGB and depth, and prior initialization is introduced to reduce the demand for annotated RGB-D datasets and accelerate the convergence of model training by reusing RGB prior weights. Extensive experiments on two publicly available datasets demonstrate the superiority of our approach over other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Liu, Y., Wang, Q., Hu, H., He, Y.: A novel real-time moving target tracking and path planning system for a quadrotor UAV in unknown unstructured outdoor scenes. IEEE Trans. Syst. Man Cybern.: Syst. 49(11), 2362–2372 (2018)
Article Google Scholar
Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2016)
Article Google Scholar
Wang, X., Gao, L., Song, J., Shen, H.: Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process. Lett. 24(4), 510–514 (2016)
Article Google Scholar
Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., Feng, D.D.: Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Trans. Syst. Man Cybern.: Syst. 49(9), 1806–1819 (2018)
Article Google Scholar
Huang, K., Zhu, C., Li, G.: Saliency detection by adaptive channel fusion. IEEE Signal Process. Lett. 25(7), 1059–1063 (2018)
Article Google Scholar
Huang, X., Shen, C., Boix, X., Zhao, Q.: Salicon: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: IEEE International Conference on Computer Vision, pp. 262–270 (2015)
Google Scholar
Pan, J., Sayrol, E., Giro-i-Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–606 (2016)
Google Scholar
Pan, J., et al.: SalGAN: visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081 (2017)
Kummerer, M., Wallis, T.S., Gatys, L.A., Bethge, M.: Understanding low-and high-level contributions to fixation prediction. In: IEEE International Conference on Computer Vision, pp. 4789–4798 (2017)
Google Scholar
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an LSTM-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)
Article MathSciNet Google Scholar
Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Le Callet, P.: How is gaze influenced by image transformations? Dataset and model. IEEE Trans. Image Process. 29, 2287–2300 (2020)
Article MATH Google Scholar
Guan, W., Wang, T., Qi, J., Zhang, L., Lu, H.: Edge-aware convolution neural network based salient object detection. IEEE Signal Process. Lett. 26(1), 114–118 (2018)
Article Google Scholar
Huang, R., Xing, Y., Wang, Z.: RGB-D salient object detection by a CNN with multiple layers fusion. IEEE Signal Process. Lett. 26(4), 552–556 (2019)
Article Google Scholar
Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: Pdnet: prior-model guided depth-enhanced network for salient object detection. In: IEEE International Conference on Multimedia and Expo, pp. 199–204 (2019)
Google Scholar
Zhou, W., Lv, Y., Lei, J., Yu, L.: Global and local-contrast guides content-aware fusion for RGB-D saliency prediction. IEEE Trans. Syst. Man Cybern.: Syst. 1–9 (2019)
Google Scholar
Lv, Y., Zhou, W., Lei, J., Ye, L., Luo, T.: Attention-based fusion network for human eye-fixation prediction in 3D images. Opt. Express 27(23), 34056–34066 (2019)
Article Google Scholar
Li, G., He, X., Zhang, W., Chang, H., Dong, L., Lin, L.: Non-locally enhanced encoder-decoder network for single image de-raining. In: ACM International Conference on Multimedia, pp. 1056–1064 (2018)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Shi, W.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
Google Scholar
Qian, R., Tan, R.T., Yang, W., Su, J., Liu, J.: Attentive generative adversarial network for raindrop removal from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2482–2491 (2018)
Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)
Google Scholar
Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015)
Google Scholar
Ma, C.-Y., Hang, H.-M.: Learning-based saliency model with depth information. J. Vis. 15(6), 19 (2015)
Article Google Scholar
Lang, C., Nguyen, T. V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: IEEE European Conference on Computer Vision, pp. 101–115 (2012). https://doi.org/10.1007/978-3-642-33709-3_8

Download references

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, Shanghai, 200444, China
Yongfang Wang, Shuyu Xiao & Peng Ye

Authors

Yongfang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuyu Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongfang Wang .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Huchuan Lu
University of Sydney, Sydney, NSW, Australia
Wanli Ouyang
Shenzhen University, Shenzhen, China
Hui Huang
Tsinghua University, Beijing, China
Jiwen Lu
Dalian University of Technology, Dalian, China
Risheng Liu
Institute of Automation, CAS, Beijing, China
Jing Dong
University of Technology Sydney, Sydney, NSW, Australia
Min Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Xiao, S., Ye, P. (2023). SAGAN: Self-attention Generative Adversarial Network for RGB-D Saliency Prediction. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14356. Springer, Cham. https://doi.org/10.1007/978-3-031-46308-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-46308-2_10
Published: 30 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46307-5
Online ISBN: 978-3-031-46308-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics