Skip to main content
Log in

Salient object detection for RGB-D images by generative adversarial network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Salient object detection for RGB-D image aims to automatically detect the objects of human interest by color and depth information. In the paper generative adversarial network is adopted to improve its performance by adversarial learning. Generator network takes RGB-D images as inputs and outputs synthetic saliency maps. It adopts double stream network to extract color and depth feature individually and then fuses them from deep to shallow progressively. Discriminator network takes RGB image and synthetic saliency maps (RGBS), RGB image and ground truth saliency map (RGBY) as inputs, and outputs their labels indicating whether input is synthetics or ground truth. It consists of three convolution blocks and three fully connected layers. In order to pursuit long-range dependency of feature, self-attention layer is inserted in both generator and discriminator network. Supervised by real labels and ground truth saliency map, discriminator network and generator network are adversarial trained to make generator network cheat discriminator network successfully and discriminator network distinguish synthetics or ground truth correctly. Experiments demonstrate adversarial learning enhances the ability of generator network, RGBS and RGBY input in discriminator network and self-attention layer play an important role in improving the performance. Meanwhile our method outperforms state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: A system for large-scale machine learning. In: 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), pp 265–283

  2. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv:1701.07875

  3. Bao J, Chen D, Wen F, Li H, Hua G (2017) Cvae-gan: fine-grained image generation through asymmetric training. In: Proceedings of the IEEE international conference on computer vision. pp 2745–2754

  4. Bao J, Jia Y, Cheng Y, Xi N (2015) Saliency-guided detection of unknown objects in RGB-d indoor scenes. Sensors 15(9):21054–21074

    Article  Google Scholar 

  5. Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis, arXiv:1809.11096

  6. Cai X, Yu H (2018) Saliency detection by conditional generative adversarial network. In: Ninth international conference on graphic and image processing (ICGIP 2017), international society for optics and photonics, vol 10615, p 1061541

  7. Chen H, Li Y (2018) Progressively complementarity-aware fusion network for RGB-D salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3051–3060

  8. Chen H, Li Y (2019) Three-stream attention-aware network for RGB-d salient object detection. IEEE Trans Image Process 28(6):2825–2835

    Article  MathSciNet  Google Scholar 

  9. Chen H, Li Y, Su D (2017) RGB-D saliency detection by multi-stream late fusion network. In: International conference on computer vision systems, pp 459–468

  10. Chen H, Li Y-F, Su D (2017) M3net: multi-scale multi-path multi-modal fusion network and example application to RGB-d salient object detection. In: Intelligent robots and systems (IROS). IEEE, Piscataway, pp 4911–4916

  11. Chen H, Li Y, Su D (2019) Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-d salient object detection. Pattern Recogn 86:376–385

    Article  Google Scholar 

  12. Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation, arXiv:1706.05587

  13. Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading, arXiv:1601.06733

  14. Cheng Y, Fu H, Wei X, Xiao J, Cao X (2014) Depth enhanced saliency detection method. ACM, New York, p 23

  15. Cheng M-M, Mitra NJ, Huang X, Torr PHS, Hu S-M (2015) Global contrast based salient region detection. IEEE TPAMI 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401

    Article  Google Scholar 

  16. Cong R, Lei J, Zhang C, Huang Q, Cao X, Hou C (2016) Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Process Let, 819–823

  17. Fan D-P, Cheng M-M, Liu J-J, Gao S-H, Hou Q, Borji A (2018) Salient objects in clutter: bringing salient object detection to the foreground. In: European conference on computer vision. Springer, Berlin, pp 196–212

  18. Fan D-P, Cheng M-M, Liu Y, Li T, Borji A (2017) Structure-measure: A new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision, pp 4558–4567

  19. Fan D-P, Gong C, Cao Y, Ren B, Cheng M-M, Borji A (2018) Enhanced-alignment measure for binary foreground map evaluation, arXiv:1805.10421

  20. Fan D-P, Lin Z, Zhang Z, Zhu ML, Cheng M-M (2020) Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. In: IEEE Transactions on neural networks and learning systems, pp 1–15

  21. Fan D-P, Wang W, Cheng M-M, Shen J (2019) Shifting more attention to video salient object detection. In: IEEE CVPR, pp 8554–8564

  22. Feng D, Barnes N, You S (2017) Hoso: histogram of surface orientation for RGB-d salient object detection. In: Digital image computing: techniques and applications (DICTA). IEEE, Piscataway, pp 1–8

  23. Feng D, Barnes N, You S, Mccarthy C (2016) Local background enclosure for RGB-D salient object detection. In: Computer vision and pattern recognition, pp 2343–2350

  24. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. pp 2672–2680

  25. Guo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. In: IEEE International conference on multimedia and expo. pp 1–6

  26. Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision. Springer, Berlin, pp 345–360

  27. Han J, Hao C, Liu N, Yan C, Li X (2017) Cnns-based RGB-d saliency detection via cross-view transfer and multiview fusion. IEEE Trans Cybern PP(99):1–13

    Google Scholar 

  28. Hou Q, Cheng M-M, Hu X, Borji A, Tu Z, Torr P (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3203–3212

  29. Hsu K-J, Lin Y-Y, Chuang Y-Y (2019) Deepco3: deep instance co-segmentation by co-peak search and co-saliency detection. In: IEEE CVPR, pp 8846–8855

  30. Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134

  31. Ji Y, Zhang H, Wu QJ (2018) Saliency detection via conditional adversarial image-to-image network. Neurocomputing 316:357–368

    Article  Google Scholar 

  32. Ju R, Ge L, Geng W, Ren T, Wu G (2014) Depth saliency based on anisotropic center-surround difference. In: Image processing ICIP. IEEE, Piscataway, pp 1115–1119

  33. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690

  34. Lee C-Y, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570

  35. Li C, Cong R, Hou J, Zhang S, Qian Y, Kwong S (2019) Nested network with two-stream pyramid for salient object detection in optical remote sensing images. arXiv:1906.08462

  36. Li M, Dong S, Zhang K, Gao Z, Wu X, Zhang H, Yang G, Li S (2018) Deep learning intra-image and inter-images features for co-saliency detection. In: BMVC, p 291

  37. Li G, Xie Y, Lin L, Yu Y (2017) Instance-level salient object segmentation. In: IEEE CVPR, pp 2386–2395

  38. Li N, Ye J, Ji Y, Ling H, Yu J (2014) Saliency detection on light field. In: IEEE CVPR, pp 2806–2813

  39. Liu Z, Shi S, Duan Q, Zhang W, Zhao P (2019) Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing

  40. Mao X, Wang S, Zheng L, Huang Q (2018) Semantic invariant cross-domain image generation with generative adversarial networks. Neurocomputing 293:55–63

    Article  Google Scholar 

  41. Martin DR, Fowlkes CC, Malik J (2004) Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans Pattern Anal Mach Intell 26(5):530–549

    Article  Google Scholar 

  42. Niu Y, Geng Y, Li X, Liu F (2012) Leveraging stereopsis for saliency analysis. In: 2012 Computer vision and pattern recognition (CVPR) IEEE Conference on, IEEE, pp 454–461

  43. Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th international conference on machine learning-volume 70, JMLR, pp 2642–2651

  44. Pan J, Canton C, Mcguinness K, O’Connor NE, Torres J, Sayrol E, Giro-i-nieto X (2017) Salgan: Visual saliency prediction with generative adversarial networks. arXiv:1701.01081

  45. Pan H, Niu X, Li R, Shen S, Dou Y (2020) Supervised adversarial networks for image saliency detection. In: Eleventh international conference on graphics and image processing (ICGIP 2019), vol 11373. International Society for Optics and Photonics, p. 113730H

  46. Parikh AP, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference, arXiv:1606.01933

  47. Parmar N, Vaswani A, Uszkoreit J, Kaiser Ł, Shazeer N, Ku A, Tran D (2018) Image transformer, arXiv:1802.05751

  48. Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2536–2544

  49. Pathak HN, Li X, Minaee S, Cowan B (2018) Efficient super resolution for large-scale images using attentional gan. In: 2018 IEEE international conference on big data (Big Data). IEEE, Piscataway, pp 1777–1786

  50. Peng H, Li B, Xiong W, Hu W, Ji R (2014) Rgbd salient object detection: a benchmark and algorithms. In: European conference on computer vision. Springer, Berlin, pp 92–109

  51. Piao Y, Ji W, Li J, Zhang M, Lu H (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE international conference on computer vision, pp 7254–7263

  52. Piao Y, Rong Z, Zhang M, Li X, Lu H (2019) Deep light-field-driven saliency detection from a single view. In: IJCAI, pp 904–911

  53. Qu L, He S, Zhang J, Tian J, Tang Y, Yang Q (2017) Rgbd salient object detection via deep fusion. IEEE Trans Image Process 26 (5):2274–2285

    Article  MathSciNet  Google Scholar 

  54. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv:1511.06434

  55. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis, arXiv:1605.05396

  56. Ren J, Gong X, Yu L, Zhou W, Ying Yang M (2015) Exploiting global priors for RGB-D saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 25–32

  57. Shen J, Peng J, Shao L (2018) Submodular trajectories for better motion segmentation in videos. IEEE TIP 27(6):2688–2700

    MathSciNet  MATH  Google Scholar 

  58. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  59. Song H, Wang W, Zhao S, Shen J, Lam K-M (2018) Pyramid dilated deeper convlstm for video salient object detection. In: ECCV, pp 715–731

  60. Song X, Zhong F, Wang Y, Qin X (2014) Estimation of kinect depth confidence through self-training. Vis Comput 30(6-8):855–865

    Article  Google Scholar 

  61. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  62. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  63. Wang N, Gong X (2019) Adaptive fusion for RGB-D salient object detection. arXiv:1901.01369

  64. Wang T, Piao Y, Li X, Zhang L, Lu H (2019) Deep learning for light field saliency detection. In: Proceedings of the IEEE international conference on computer vision, pp 8838–8848

  65. Wang W, Shen J, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE TIP 27(1):38–49

    MathSciNet  MATH  Google Scholar 

  66. Wang W, Shen J, Xie J, Cheng M-M, Ling H, Borji A (2019) Revisiting video saliency prediction in the deep learning era. IEEE PAMI

  67. Wang L, Wang L, Lu H, Zhang P, Ruan X (2018) Salient object detection with recurrent fully convolutional networks. IEEE Trans Pattern Anal Mach Intell PP(99):1–1

    Google Scholar 

  68. Wang C, Zha Z-J, Liu D, Xie H (2019) Robust deep co-saliency detection with group semantic. In: AAAI, pp 8917–8924

  69. Wang S, Zhou Z, Jin W, Qu H (2018) Visual saliency detection for RGB-d images under a bayesian framework. Ipsj Trans Comput Vis Appl 10(1):1

    Article  Google Scholar 

  70. Wei Y (2014) Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans Pattern Anal Mach Intell 37(4):862

    Google Scholar 

  71. Wei L, Zhao S, Bourahla OEF, Li X, Wu F, Zhuang Y (2019) Deep group-wise fully convolutional network for co-saliency detection with graph propagation. IEEE TIP

  72. Yan B, Wang H, Wang X, Zhang Y (2017) An accurate saliency prediction method based on generative adversarial networks. In: Image processing (ICIP), 2017 IEEE international conference on, IEEE, pp 2339-2343

  73. Yoon YJ, Jaechun NO, Choi SM (2017) Saliency-guided stereo camera control for comfortable vr explorations. Ieice Trans Inf Syst E100.D (9) 2245–2248

  74. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention

  75. Zeng Y, Zhang P, Zhang J, Lin Z, Lu H (2019) Towards high-resolution salient object detection. In: IEEE ICCV, pp 1–10

  76. Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks, arXiv:1805.08318

  77. Zhang K, Li T, Liu B, Liu Q (2019) Co-saliency detection via mask-guided fully convolutional networks with multi-scale label smoothing. In: IEEE CVPR, pp 3095–3104

  78. Zhang P, Wang D, Lu H, Wang H, Xiang R (2017) Amulet: Aggregating multi-level convolutional features for salient object detection. In: IEEE international conference on computer vision, pp 202–211

  79. Zhang X, Wang T, Qi J, Lu H, Wang G (2018) Progressive attention guided recurrent network for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 714–722

  80. Zhang P, Wang L, Wang D, Lu H, Shen C (2018) Agile amulet: real-time salient object detection with contextual attention, arXiv:1802.06960

  81. Zhao J-X, Cao Y, Fan D-P, Cheng M-M, Li X-Y, Zhang L (2019) Contrast prior and fluid pyramid integration for rgbd salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3927–3936

  82. Zhao J, Cao Y, Fan D-P, Li X-Y, Zhang L, Cheng M-M (2019) Contrast prior and fluid pyramid integration for RGBD salient object detection. In: IEEE CVPR, pp 3927–3936

  83. Zhao J-X, Liu J, Fan D-P, Cao Y, Yang J, Cheng M-M (2019) EGNet: Edge guidance network for salient object detection. arXiv:1908.08297

  84. Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1265–1274

  85. Zhu C-B, Cai X, Huang K, Li TH, Li G (2019) PDNet: Prior-model guided depth-enhanced network for salient object detection. In: 2019 IEEE International conference on multimedia and expo, pp 199–204

  86. Zhu D, Dai L, Luo Y, Zhang G, Shao X, Itti L, Lu J (2018) Multi-scale adversarial feature learning for saliency detection. Symmetry 10(10):457

    Article  Google Scholar 

  87. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232

Download references

Acknowledgment

We thank Dr. Hao Chen from City University of Hong Kong for providing their result saliency maps. We also thank Prof. Ming-ming Cheng and Dr. Deng-ping Fan from Nankai University for providing the codes of all evaluation metrics. We further thank all anonymous reviewers for their valuable comments. This research is supported by National Natural Science Foundation of China (61602004), Natural Science Foundation of Anhui Province (1908085MF182) and Key Program of Natural Science Project of Educational Commission of Anhui Province (KJ2019A0034).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengyi Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Tang, J., Xiang, Q. et al. Salient object detection for RGB-D images by generative adversarial network. Multimed Tools Appl 79, 25403–25425 (2020). https://doi.org/10.1007/s11042-020-09188-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09188-8

Keywords

Navigation