Salient object detection for RGB-D images by generative adversarial network

Liu, Zhengyi; Tang, Jiting; Xiang, Qian; Zhao, Peng

doi:10.1007/s11042-020-09188-8

Salient object detection for RGB-D images by generative adversarial network

Published: 02 July 2020

Volume 79, pages 25403–25425, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhengyi Liu ORCID: orcid.org/0000-0003-3265-823X¹,
Jiting Tang¹,
Qian Xiang¹ &
…
Peng Zhao¹

724 Accesses
12 Citations
Explore all metrics

Abstract

Salient object detection for RGB-D image aims to automatically detect the objects of human interest by color and depth information. In the paper generative adversarial network is adopted to improve its performance by adversarial learning. Generator network takes RGB-D images as inputs and outputs synthetic saliency maps. It adopts double stream network to extract color and depth feature individually and then fuses them from deep to shallow progressively. Discriminator network takes RGB image and synthetic saliency maps (RGBS), RGB image and ground truth saliency map (RGBY) as inputs, and outputs their labels indicating whether input is synthetics or ground truth. It consists of three convolution blocks and three fully connected layers. In order to pursuit long-range dependency of feature, self-attention layer is inserted in both generator and discriminator network. Supervised by real labels and ground truth saliency map, discriminator network and generator network are adversarial trained to make generator network cheat discriminator network successfully and discriminator network distinguish synthetics or ground truth correctly. Experiments demonstrate adversarial learning enhances the ability of generator network, RGBS and RGBY input in discriminator network and self-attention layer play an important role in improving the performance. Meanwhile our method outperforms state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SAGAN: Self-attention Generative Adversarial Network for RGB-D Saliency Prediction

iSalGAN - An Improvised Saliency GAN

Robust salient object detection for RGB images

Article 05 December 2019

Zhengyi Liu, Qian Xiang, … Peng Zhao

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: A system for large-scale machine learning. In: 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), pp 265–283
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv:1701.07875
Bao J, Chen D, Wen F, Li H, Hua G (2017) Cvae-gan: fine-grained image generation through asymmetric training. In: Proceedings of the IEEE international conference on computer vision. pp 2745–2754
Bao J, Jia Y, Cheng Y, Xi N (2015) Saliency-guided detection of unknown objects in RGB-d indoor scenes. Sensors 15(9):21054–21074
Article Google Scholar
Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis, arXiv:1809.11096
Cai X, Yu H (2018) Saliency detection by conditional generative adversarial network. In: Ninth international conference on graphic and image processing (ICGIP 2017), international society for optics and photonics, vol 10615, p 1061541
Chen H, Li Y (2018) Progressively complementarity-aware fusion network for RGB-D salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3051–3060
Chen H, Li Y (2019) Three-stream attention-aware network for RGB-d salient object detection. IEEE Trans Image Process 28(6):2825–2835
Article MathSciNet Google Scholar
Chen H, Li Y, Su D (2017) RGB-D saliency detection by multi-stream late fusion network. In: International conference on computer vision systems, pp 459–468
Chen H, Li Y-F, Su D (2017) M3net: multi-scale multi-path multi-modal fusion network and example application to RGB-d salient object detection. In: Intelligent robots and systems (IROS). IEEE, Piscataway, pp 4911–4916
Chen H, Li Y, Su D (2019) Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-d salient object detection. Pattern Recogn 86:376–385
Article Google Scholar
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation, arXiv:1706.05587
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading, arXiv:1601.06733
Cheng Y, Fu H, Wei X, Xiao J, Cao X (2014) Depth enhanced saliency detection method. ACM, New York, p 23
Cheng M-M, Mitra NJ, Huang X, Torr PHS, Hu S-M (2015) Global contrast based salient region detection. IEEE TPAMI 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401
Article Google Scholar
Cong R, Lei J, Zhang C, Huang Q, Cao X, Hou C (2016) Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Process Let, 819–823
Fan D-P, Cheng M-M, Liu J-J, Gao S-H, Hou Q, Borji A (2018) Salient objects in clutter: bringing salient object detection to the foreground. In: European conference on computer vision. Springer, Berlin, pp 196–212
Fan D-P, Cheng M-M, Liu Y, Li T, Borji A (2017) Structure-measure: A new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision, pp 4558–4567
Fan D-P, Gong C, Cao Y, Ren B, Cheng M-M, Borji A (2018) Enhanced-alignment measure for binary foreground map evaluation, arXiv:1805.10421
Fan D-P, Lin Z, Zhang Z, Zhu ML, Cheng M-M (2020) Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. In: IEEE Transactions on neural networks and learning systems, pp 1–15
Fan D-P, Wang W, Cheng M-M, Shen J (2019) Shifting more attention to video salient object detection. In: IEEE CVPR, pp 8554–8564
Feng D, Barnes N, You S (2017) Hoso: histogram of surface orientation for RGB-d salient object detection. In: Digital image computing: techniques and applications (DICTA). IEEE, Piscataway, pp 1–8
Feng D, Barnes N, You S, Mccarthy C (2016) Local background enclosure for RGB-D salient object detection. In: Computer vision and pattern recognition, pp 2343–2350
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. pp 2672–2680
Guo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. In: IEEE International conference on multimedia and expo. pp 1–6
Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision. Springer, Berlin, pp 345–360
Han J, Hao C, Liu N, Yan C, Li X (2017) Cnns-based RGB-d saliency detection via cross-view transfer and multiview fusion. IEEE Trans Cybern PP(99):1–13
Google Scholar
Hou Q, Cheng M-M, Hu X, Borji A, Tu Z, Torr P (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3203–3212
Hsu K-J, Lin Y-Y, Chuang Y-Y (2019) Deepco3: deep instance co-segmentation by co-peak search and co-saliency detection. In: IEEE CVPR, pp 8846–8855
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Ji Y, Zhang H, Wu QJ (2018) Saliency detection via conditional adversarial image-to-image network. Neurocomputing 316:357–368
Article Google Scholar
Ju R, Ge L, Geng W, Ren T, Wu G (2014) Depth saliency based on anisotropic center-surround difference. In: Image processing ICIP. IEEE, Piscataway, pp 1115–1119
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
Lee C-Y, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570
Li C, Cong R, Hou J, Zhang S, Qian Y, Kwong S (2019) Nested network with two-stream pyramid for salient object detection in optical remote sensing images. arXiv:1906.08462
Li M, Dong S, Zhang K, Gao Z, Wu X, Zhang H, Yang G, Li S (2018) Deep learning intra-image and inter-images features for co-saliency detection. In: BMVC, p 291
Li G, Xie Y, Lin L, Yu Y (2017) Instance-level salient object segmentation. In: IEEE CVPR, pp 2386–2395
Li N, Ye J, Ji Y, Ling H, Yu J (2014) Saliency detection on light field. In: IEEE CVPR, pp 2806–2813
Liu Z, Shi S, Duan Q, Zhang W, Zhao P (2019) Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing
Mao X, Wang S, Zheng L, Huang Q (2018) Semantic invariant cross-domain image generation with generative adversarial networks. Neurocomputing 293:55–63
Article Google Scholar
Martin DR, Fowlkes CC, Malik J (2004) Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans Pattern Anal Mach Intell 26(5):530–549
Article Google Scholar
Niu Y, Geng Y, Li X, Liu F (2012) Leveraging stereopsis for saliency analysis. In: 2012 Computer vision and pattern recognition (CVPR) IEEE Conference on, IEEE, pp 454–461
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th international conference on machine learning-volume 70, JMLR, pp 2642–2651
Pan J, Canton C, Mcguinness K, O’Connor NE, Torres J, Sayrol E, Giro-i-nieto X (2017) Salgan: Visual saliency prediction with generative adversarial networks. arXiv:1701.01081
Pan H, Niu X, Li R, Shen S, Dou Y (2020) Supervised adversarial networks for image saliency detection. In: Eleventh international conference on graphics and image processing (ICGIP 2019), vol 11373. International Society for Optics and Photonics, p. 113730H
Parikh AP, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference, arXiv:1606.01933
Parmar N, Vaswani A, Uszkoreit J, Kaiser Ł, Shazeer N, Ku A, Tran D (2018) Image transformer, arXiv:1802.05751
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2536–2544
Pathak HN, Li X, Minaee S, Cowan B (2018) Efficient super resolution for large-scale images using attentional gan. In: 2018 IEEE international conference on big data (Big Data). IEEE, Piscataway, pp 1777–1786
Peng H, Li B, Xiong W, Hu W, Ji R (2014) Rgbd salient object detection: a benchmark and algorithms. In: European conference on computer vision. Springer, Berlin, pp 92–109
Piao Y, Ji W, Li J, Zhang M, Lu H (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE international conference on computer vision, pp 7254–7263
Piao Y, Rong Z, Zhang M, Li X, Lu H (2019) Deep light-field-driven saliency detection from a single view. In: IJCAI, pp 904–911
Qu L, He S, Zhang J, Tian J, Tang Y, Yang Q (2017) Rgbd salient object detection via deep fusion. IEEE Trans Image Process 26 (5):2274–2285
Article MathSciNet Google Scholar
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv:1511.06434
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis, arXiv:1605.05396
Ren J, Gong X, Yu L, Zhou W, Ying Yang M (2015) Exploiting global priors for RGB-D saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 25–32
Shen J, Peng J, Shao L (2018) Submodular trajectories for better motion segmentation in videos. IEEE TIP 27(6):2688–2700
MathSciNet MATH Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
Song H, Wang W, Zhao S, Shen J, Lam K-M (2018) Pyramid dilated deeper convlstm for video salient object detection. In: ECCV, pp 715–731
Song X, Zhong F, Wang Y, Qin X (2014) Estimation of kinect depth confidence through self-training. Vis Comput 30(6-8):855–865
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Wang N, Gong X (2019) Adaptive fusion for RGB-D salient object detection. arXiv:1901.01369
Wang T, Piao Y, Li X, Zhang L, Lu H (2019) Deep learning for light field saliency detection. In: Proceedings of the IEEE international conference on computer vision, pp 8838–8848
Wang W, Shen J, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE TIP 27(1):38–49
MathSciNet MATH Google Scholar
Wang W, Shen J, Xie J, Cheng M-M, Ling H, Borji A (2019) Revisiting video saliency prediction in the deep learning era. IEEE PAMI
Wang L, Wang L, Lu H, Zhang P, Ruan X (2018) Salient object detection with recurrent fully convolutional networks. IEEE Trans Pattern Anal Mach Intell PP(99):1–1
Google Scholar
Wang C, Zha Z-J, Liu D, Xie H (2019) Robust deep co-saliency detection with group semantic. In: AAAI, pp 8917–8924
Wang S, Zhou Z, Jin W, Qu H (2018) Visual saliency detection for RGB-d images under a bayesian framework. Ipsj Trans Comput Vis Appl 10(1):1
Article Google Scholar
Wei Y (2014) Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans Pattern Anal Mach Intell 37(4):862
Google Scholar
Wei L, Zhao S, Bourahla OEF, Li X, Wu F, Zhuang Y (2019) Deep group-wise fully convolutional network for co-saliency detection with graph propagation. IEEE TIP
Yan B, Wang H, Wang X, Zhang Y (2017) An accurate saliency prediction method based on generative adversarial networks. In: Image processing (ICIP), 2017 IEEE international conference on, IEEE, pp 2339-2343
Yoon YJ, Jaechun NO, Choi SM (2017) Saliency-guided stereo camera control for comfortable vr explorations. Ieice Trans Inf Syst E100.D (9) 2245–2248
Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention
Zeng Y, Zhang P, Zhang J, Lin Z, Lu H (2019) Towards high-resolution salient object detection. In: IEEE ICCV, pp 1–10
Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks, arXiv:1805.08318
Zhang K, Li T, Liu B, Liu Q (2019) Co-saliency detection via mask-guided fully convolutional networks with multi-scale label smoothing. In: IEEE CVPR, pp 3095–3104
Zhang P, Wang D, Lu H, Wang H, Xiang R (2017) Amulet: Aggregating multi-level convolutional features for salient object detection. In: IEEE international conference on computer vision, pp 202–211
Zhang X, Wang T, Qi J, Lu H, Wang G (2018) Progressive attention guided recurrent network for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 714–722
Zhang P, Wang L, Wang D, Lu H, Shen C (2018) Agile amulet: real-time salient object detection with contextual attention, arXiv:1802.06960
Zhao J-X, Cao Y, Fan D-P, Cheng M-M, Li X-Y, Zhang L (2019) Contrast prior and fluid pyramid integration for rgbd salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3927–3936
Zhao J, Cao Y, Fan D-P, Li X-Y, Zhang L, Cheng M-M (2019) Contrast prior and fluid pyramid integration for RGBD salient object detection. In: IEEE CVPR, pp 3927–3936
Zhao J-X, Liu J, Fan D-P, Cao Y, Yang J, Cheng M-M (2019) EGNet: Edge guidance network for salient object detection. arXiv:1908.08297
Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1265–1274
Zhu C-B, Cai X, Huang K, Li TH, Li G (2019) PDNet: Prior-model guided depth-enhanced network for salient object detection. In: 2019 IEEE International conference on multimedia and expo, pp 199–204
Zhu D, Dai L, Luo Y, Zhang G, Shao X, Itti L, Lu J (2018) Multi-scale adversarial feature learning for saliency detection. Symmetry 10(10):457
Article Google Scholar
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232

Download references

Acknowledgment

We thank Dr. Hao Chen from City University of Hong Kong for providing their result saliency maps. We also thank Prof. Ming-ming Cheng and Dr. Deng-ping Fan from Nankai University for providing the codes of all evaluation metrics. We further thank all anonymous reviewers for their valuable comments. This research is supported by National Natural Science Foundation of China (61602004), Natural Science Foundation of Anhui Province (1908085MF182) and Key Program of Natural Science Project of Educational Commission of Anhui Province (KJ2019A0034).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China
Zhengyi Liu, Jiting Tang, Qian Xiang & Peng Zhao

Authors

Zhengyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiting Tang
View author publications
You can also search for this author in PubMed Google Scholar
Qian Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengyi Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Tang, J., Xiang, Q. et al. Salient object detection for RGB-D images by generative adversarial network. Multimed Tools Appl 79, 25403–25425 (2020). https://doi.org/10.1007/s11042-020-09188-8

Download citation

Received: 22 July 2019
Revised: 17 April 2020
Accepted: 05 June 2020
Published: 02 July 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11042-020-09188-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Salient object detection for RGB-D images by generative adversarial network

Abstract

Access this article

Similar content being viewed by others

SAGAN: Self-attention Generative Adversarial Network for RGB-D Saliency Prediction

iSalGAN - An Improvised Saliency GAN

Robust salient object detection for RGB images

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Salient object detection for RGB-D images by generative adversarial network

Abstract

Access this article

Similar content being viewed by others

SAGAN: Self-attention Generative Adversarial Network for RGB-D Saliency Prediction

iSalGAN - An Improvised Saliency GAN

Robust salient object detection for RGB images

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation