Scale Adaptive Fusion Network for RGB-D Salient Object Detection

Kong, Yuqiu; Zheng, Yushuo; Yao, Cuili; Liu, Yang; Wang, He

doi:10.1007/978-3-031-26313-2_37

Yuqiu Kong¹²,
Yushuo Zheng¹³,
Cuili Yao¹²,
Yang Liu¹² &
…
He Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13843))

Included in the following conference series:

Asian Conference on Computer Vision

Abstract

RGB-D Salient Object Detection (SOD) is a fundamental problem in the field of computer vision and relies heavily on multi-modal interaction between the RGB and depth information. However, most existing approaches adopt the same fusion module to integrate RGB and depth features in multiple scales of the networks, without distinguishing the unique attributes of different layers, e.g., the geometric information in the shallower scales, the structural features in the middle scales, and the semantic cues in the deeper scales. In this work, we propose a Scale Adaptive Fusion Network (SAFNet) for RGB-D SOD which employs scale adaptive modules to fuse the RGB-D features. Specifically, for the shallow scale, we conduct the early fusion strategy by mapping the 2D RGB-D images to a 3D point cloud and learning a unified representation of the geometric information in the 3D space. For the middle scale, we model the structural features from multi-modalities by exploring spatial contrast information from the depth space. For the deep scale, we design a depth-aware channel-wise attention module to enhance the semantic representation of the two modalities. Extensive experiments demonstrate the superiority of the scale adaptive fusion strategy adopted by our method. The proposed SAFNet achieves favourable performance against state-of-the-art algorithms on six large-scale benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

Article 02 June 2023

Multi-modality information refinement fusion network for RGB-D salient object detection

Article 21 September 2023

A Single Stream Network for Robust and Real-Time RGB-D Salient Object Detection

References

Achanta, R., Hemami, S.S., Estrada, F.J., Süsstrunk, S.: Frequency-tuned salient region detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604 (2009)
Google Scholar
Borji, A., Sihite, D.N., Itti, L.: Salient object detection: a benchmark. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 414–429. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_30
Chapter Google Scholar
Chen, S., Fu, Y.: Progressively guided alternate refinement network for RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 520–538. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_31
Chapter Google Scholar
Desingh, K., Krishna, K.M., Rajan, D., Jawahar, C.V.: Depth really matters: Improving visual salient region detection with depth. In: British Machine Vision Conference (2013)
Google Scholar
Ding, Y., Liu, Z., Huang, M., Shi, R., Wang, X.: Depth-aware saliency detection using convolutional neural networks. J. Vis. Commun. Image Represent. 61, 1–9 (2019)
Article Google Scholar
Fan, D., Cheng, M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: IEEE International Conference on Computer Vision, pp. 4558–4567 (2017)
Google Scholar
Fan, D., Gong, C., Cao, Y., Ren, B., Cheng, M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. In: International Joint Conference on Artificial Intelligence, pp. 698–704 (2018)
Google Scholar
Fan, D., et al.: Rethinking RGB-D salient object detection: models, datasets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32(5), 2075–2089 (2020)
Article Google Scholar
Garcia-Garcia, A., Gomez-Donoso, F., Rodríguez, J.G., Orts-Escolano, S., Cazorla, M., López, J.A.: Pointnet: a 3D convolutional neural network for real-time object class recognition. In: International Joint Conference on Neural Networks, pp. 1578–1584 (2016)
Google Scholar
Han, J., Chen, H., Liu, N., Yan, C., Li, X.: CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans. Cybernatics 48(11), 3171–3183 (2018)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hou, Q., Cheng, M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 815–828 (2019)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, vol. 37, pp. 448–456 (2015)
Google Scholar
Ji, W., et al.: Calibrated RGB-D salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9471–9481 (2021)
Google Scholar
Jin, W., Xu, J., Han, Q., Zhang, Y., Cheng, M.: CDNet: Complementary depth network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3376–3390 (2021)
Article Google Scholar
Ju, R., Liu, Y., Ren, T., Ge, L., Wu, G.: Depth-aware salient object detection using anisotropic center-surround difference. Sig. Process. Image Commun. 38, 115–126 (2015)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1106–1114 (2012)
Google Scholar
Lang, C., Nguyen, T.V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 101–115. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_8
Chapter Google Scholar
Li, G., Liu, Z., Chen, M., Bai, Z., Lin, W., Ling, H.: Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)
Article Google Scholar
Li, G., Liu, Z., Ye, L., Wang, Y., Ling, H.: Cross-modal weighting network for RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 665–681. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_39
Chapter Google Scholar
Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. IEEE Trans. Pattern Anal. Mach. Intell. 39(8), 1605–1616 (2017)
Article Google Scholar
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for RGB-D saliency detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 13753–13762 (2020)
Google Scholar
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 404–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_24
Chapter Google Scholar
Liu, Z., Shi, S., Duan, Q., Zhang, W., Zhao, P.: Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363, 46–57 (2019)
Article Google Scholar
Mahadevan, V., Vasconcelos, N.: Saliency-based discriminant tracking. In: Computer Vision and Pattern Recognition, pp. 1007–1013 (2009)
Google Scholar
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–461 (2012)
Google Scholar
Pang, Y., Zhang, L., Zhao, X., Lu, H.: Hierarchical dynamic filtering network for RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 235–252. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_15
Chapter Google Scholar
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: a benchmark and algorithms. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 92–109. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_7
Chapter Google Scholar
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: IEEE International Conference on Computer Vision, pp. 7253–7262 (2019)
Google Scholar
Piao, Y., Rong, Z., Zhang, M., Ren, W., Lu, H.: A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9057–9066 (2020)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Google Scholar
Quo, J., Ren, T., Bei, J.: Salient object detection for RGB-D image via saliency evolution. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2016)
Google Scholar
Ren, Z., Gao, S., Chia, L., Tsang, I.W.: Region-based saliency detection and its application in object recognition. IEEE Trans. Circuits Syst. Video Technol. 24(5), 769–779 (2014)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Google Scholar
Sun, P., Zhang, W., Wang, H., Li, S., Li, X.: Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1407–1417 (2021)
Google Scholar
Ulyanov, D., Vedaldi, A., Lempitsky, V.S.: Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)
Wu, Z., Su, L., Huang, Q.: Cascaded partial decoder for fast and accurate salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2019)
Google Scholar
Yan, S., Song, X., Yu, C.: SDCNet: size divide and conquer network for salient object detection. In: Asian Conference on Computer Vision, pp. 637–653 (2020)
Google Scholar
Yang, S., Lin, W., Lin, G., Jiang, Q., Liu, Z.: Progressive self-guided loss for salient object detection. IEEE Trans. Image Process. 30, 8426–8438 (2021)
Article Google Scholar
Zhang, C., et al.: Cross-modality discrepant interaction network for RGB-D salient object detection. In: ACM Multimedia, pp. 2094–2102 (2021)
Google Scholar
Zhang, J., et al.: UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8579–8588 (2020)
Google Scholar
Zhang, J., et al.: RGB-D saliency detection via cascaded mutual information minimization. In: IEEE International Conference on Computer Vision, pp. 4318–4327 (2021)
Google Scholar
Zhang, M., Fei, S.X., Liu, J., Xu, S., Piao, Y., Lu, H.: Asymmetric two-stream architecture for accurate RGB-D saliency detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 374–390. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_23
Chapter Google Scholar
Zhao, X., Pang, Y., Zhang, L., Lu, H., Ruan, X.: Self-supervised pretraining for RGB-D salient object detection. In: Association for the Advancement of Artificial Intelligence, pp. 3463–3471 (2022)
Google Scholar
Zhao, X., Zhang, L., Pang, Y., Lu, H., Zhang, L.: A single stream network for robust and real-time RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 646–662. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_39
Chapter Google Scholar
Zhou, T., Fu, H., Chen, G., Zhou, Y., Fan, D., Shao, L.: Specificity-preserving RGB-D saliency detection. In: IEEE International Conference on Computer Vision, pp. 4661–4671 (2021)
Google Scholar
Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: PDNet: prior-model guided depth-enhanced network for salient object detection. In: IEEE International Conference on Multimedia and Expo, pp. 199–204 (2019)
Google Scholar
Zhu, J., Wu, J., Xu, Y., Chang, E.I., Tu, Z.: Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(4), 862–875 (2015)
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Ministry of Science and Technology of the People’s Republic of China no. 2018AAA0102003, National Natural Science Foundation of China under Grant no. 62006037, and the Fundamental Research Funds for the Central Universities no. DUT22JC06.

Author information

Authors and Affiliations

School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, China
Yuqiu Kong, Cuili Yao & Yang Liu
School of Mechanical Engineering, Dalian University of Technology, Dalian, China
Yushuo Zheng
School of Computer Science and Technology, Dalian University of Technology, Dalian, China
He Wang

Authors

Yuqiu Kong
View author publications
You can also search for this author in PubMed Google Scholar
Yushuo Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Cuili Yao
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
He Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cuili Yao .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kong, Y., Zheng, Y., Yao, C., Liu, Y., Wang, H. (2023). Scale Adaptive Fusion Network for RGB-D Salient Object Detection. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13843. Springer, Cham. https://doi.org/10.1007/978-3-031-26313-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-26313-2_37
Published: 02 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26312-5
Online ISBN: 978-3-031-26313-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scale Adaptive Fusion Network for RGB-D Salient Object Detection