Skip to main content

SAGAN: Self-attention Generative Adversarial Network for RGB-D Saliency Prediction

  • Conference paper
  • First Online:
Image and Graphics (ICIG 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14356))

Included in the following conference series:

  • 288 Accesses

Abstract

A key problem in RGB-D saliency prediction is how to effectively exploit the multi-model cues. In the paper, we propose a self-attention generative adversarial network (SAGAN) for RGB-D saliency prediction, which can extract heterogeneous features from RGB and depth by driving long-range dependency modeling and adversarial training. Specifically, we explore selective fusion based on channel attention (SFA) and prior initialization approach to efficiently learn the salient cues of RGB-D images. SFA is designed to adaptively select and fuse features in different level between RGB and depth, and prior initialization is introduced to reduce the demand for annotated RGB-D datasets and accelerate the convergence of model training by reusing RGB prior weights. Extensive experiments on two publicly available datasets demonstrate the superiority of our approach over other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, Y., Wang, Q., Hu, H., He, Y.: A novel real-time moving target tracking and path planning system for a quadrotor UAV in unknown unstructured outdoor scenes. IEEE Trans. Syst. Man Cybern.: Syst. 49(11), 2362–2372 (2018)

    Article  Google Scholar 

  2. Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2016)

    Article  Google Scholar 

  3. Wang, X., Gao, L., Song, J., Shen, H.: Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process. Lett. 24(4), 510–514 (2016)

    Article  Google Scholar 

  4. Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., Feng, D.D.: Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Trans. Syst. Man Cybern.: Syst. 49(9), 1806–1819 (2018)

    Article  Google Scholar 

  5. Huang, K., Zhu, C., Li, G.: Saliency detection by adaptive channel fusion. IEEE Signal Process. Lett. 25(7), 1059–1063 (2018)

    Article  Google Scholar 

  6. Huang, X., Shen, C., Boix, X., Zhao, Q.: Salicon: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: IEEE International Conference on Computer Vision, pp. 262–270 (2015)

    Google Scholar 

  7. Pan, J., Sayrol, E., Giro-i-Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–606 (2016)

    Google Scholar 

  8. Pan, J., et al.: SalGAN: visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081 (2017)

  9. Kummerer, M., Wallis, T.S., Gatys, L.A., Bethge, M.: Understanding low-and high-level contributions to fixation prediction. In: IEEE International Conference on Computer Vision, pp. 4789–4798 (2017)

    Google Scholar 

  10. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an LSTM-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)

    Article  MathSciNet  Google Scholar 

  11. Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Le Callet, P.: How is gaze influenced by image transformations? Dataset and model. IEEE Trans. Image Process. 29, 2287–2300 (2020)

    Article  MATH  Google Scholar 

  12. Guan, W., Wang, T., Qi, J., Zhang, L., Lu, H.: Edge-aware convolution neural network based salient object detection. IEEE Signal Process. Lett. 26(1), 114–118 (2018)

    Article  Google Scholar 

  13. Huang, R., Xing, Y., Wang, Z.: RGB-D salient object detection by a CNN with multiple layers fusion. IEEE Signal Process. Lett. 26(4), 552–556 (2019)

    Article  Google Scholar 

  14. Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: Pdnet: prior-model guided depth-enhanced network for salient object detection. In: IEEE International Conference on Multimedia and Expo, pp. 199–204 (2019)

    Google Scholar 

  15. Zhou, W., Lv, Y., Lei, J., Yu, L.: Global and local-contrast guides content-aware fusion for RGB-D saliency prediction. IEEE Trans. Syst. Man Cybern.: Syst. 1–9 (2019)

    Google Scholar 

  16. Lv, Y., Zhou, W., Lei, J., Ye, L., Luo, T.: Attention-based fusion network for human eye-fixation prediction in 3D images. Opt. Express 27(23), 34056–34066 (2019)

    Article  Google Scholar 

  17. Li, G., He, X., Zhang, W., Chang, H., Dong, L., Lin, L.: Non-locally enhanced encoder-decoder network for single image de-raining. In: ACM International Conference on Multimedia, pp. 1056–1064 (2018)

    Google Scholar 

  18. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  19. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

    Google Scholar 

  20. Shi, W.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)

    Google Scholar 

  21. Qian, R., Tan, R.T., Yang, W., Su, J., Liu, J.: Attentive generative adversarial network for raindrop removal from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2482–2491 (2018)

    Google Scholar 

  22. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)

  23. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)

    Google Scholar 

  24. Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015)

    Google Scholar 

  25. Ma, C.-Y., Hang, H.-M.: Learning-based saliency model with depth information. J. Vis. 15(6), 19 (2015)

    Article  Google Scholar 

  26. Lang, C., Nguyen, T. V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: IEEE European Conference on Computer Vision, pp. 101–115 (2012). https://doi.org/10.1007/978-3-642-33709-3_8

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongfang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y., Xiao, S., Ye, P. (2023). SAGAN: Self-attention Generative Adversarial Network for RGB-D Saliency Prediction. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14356. Springer, Cham. https://doi.org/10.1007/978-3-031-46308-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46308-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46307-5

  • Online ISBN: 978-3-031-46308-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics