Abstract
RGB-D Salient Object Detection (SOD) is a fundamental problem in the field of computer vision and relies heavily on multi-modal interaction between the RGB and depth information. However, most existing approaches adopt the same fusion module to integrate RGB and depth features in multiple scales of the networks, without distinguishing the unique attributes of different layers, e.g., the geometric information in the shallower scales, the structural features in the middle scales, and the semantic cues in the deeper scales. In this work, we propose a Scale Adaptive Fusion Network (SAFNet) for RGB-D SOD which employs scale adaptive modules to fuse the RGB-D features. Specifically, for the shallow scale, we conduct the early fusion strategy by mapping the 2D RGB-D images to a 3D point cloud and learning a unified representation of the geometric information in the 3D space. For the middle scale, we model the structural features from multi-modalities by exploring spatial contrast information from the depth space. For the deep scale, we design a depth-aware channel-wise attention module to enhance the semantic representation of the two modalities. Extensive experiments demonstrate the superiority of the scale adaptive fusion strategy adopted by our method. The proposed SAFNet achieves favourable performance against state-of-the-art algorithms on six large-scale benchmarks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achanta, R., Hemami, S.S., Estrada, F.J., Süsstrunk, S.: Frequency-tuned salient region detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604 (2009)
Borji, A., Sihite, D.N., Itti, L.: Salient object detection: a benchmark. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 414–429. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_30
Chen, S., Fu, Y.: Progressively guided alternate refinement network for RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 520–538. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_31
Desingh, K., Krishna, K.M., Rajan, D., Jawahar, C.V.: Depth really matters: Improving visual salient region detection with depth. In: British Machine Vision Conference (2013)
Ding, Y., Liu, Z., Huang, M., Shi, R., Wang, X.: Depth-aware saliency detection using convolutional neural networks. J. Vis. Commun. Image Represent. 61, 1–9 (2019)
Fan, D., Cheng, M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: IEEE International Conference on Computer Vision, pp. 4558–4567 (2017)
Fan, D., Gong, C., Cao, Y., Ren, B., Cheng, M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. In: International Joint Conference on Artificial Intelligence, pp. 698–704 (2018)
Fan, D., et al.: Rethinking RGB-D salient object detection: models, datasets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32(5), 2075–2089 (2020)
Garcia-Garcia, A., Gomez-Donoso, F., Rodríguez, J.G., Orts-Escolano, S., Cazorla, M., López, J.A.: Pointnet: a 3D convolutional neural network for real-time object class recognition. In: International Joint Conference on Neural Networks, pp. 1578–1584 (2016)
Han, J., Chen, H., Liu, N., Yan, C., Li, X.: CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans. Cybernatics 48(11), 3171–3183 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hou, Q., Cheng, M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 815–828 (2019)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, vol. 37, pp. 448–456 (2015)
Ji, W., et al.: Calibrated RGB-D salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9471–9481 (2021)
Jin, W., Xu, J., Han, Q., Zhang, Y., Cheng, M.: CDNet: Complementary depth network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3376–3390 (2021)
Ju, R., Liu, Y., Ren, T., Ge, L., Wu, G.: Depth-aware salient object detection using anisotropic center-surround difference. Sig. Process. Image Commun. 38, 115–126 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1106–1114 (2012)
Lang, C., Nguyen, T.V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 101–115. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_8
Li, G., Liu, Z., Chen, M., Bai, Z., Lin, W., Ling, H.: Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)
Li, G., Liu, Z., Ye, L., Wang, Y., Ling, H.: Cross-modal weighting network for RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 665–681. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_39
Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. IEEE Trans. Pattern Anal. Mach. Intell. 39(8), 1605–1616 (2017)
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for RGB-D saliency detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 13753–13762 (2020)
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 404–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_24
Liu, Z., Shi, S., Duan, Q., Zhang, W., Zhao, P.: Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363, 46–57 (2019)
Mahadevan, V., Vasconcelos, N.: Saliency-based discriminant tracking. In: Computer Vision and Pattern Recognition, pp. 1007–1013 (2009)
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–461 (2012)
Pang, Y., Zhang, L., Zhao, X., Lu, H.: Hierarchical dynamic filtering network for RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 235–252. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_15
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: a benchmark and algorithms. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 92–109. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_7
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: IEEE International Conference on Computer Vision, pp. 7253–7262 (2019)
Piao, Y., Rong, Z., Zhang, M., Ren, W., Lu, H.: A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9057–9066 (2020)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Quo, J., Ren, T., Bei, J.: Salient object detection for RGB-D image via saliency evolution. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2016)
Ren, Z., Gao, S., Chia, L., Tsang, I.W.: Region-based saliency detection and its application in object recognition. IEEE Trans. Circuits Syst. Video Technol. 24(5), 769–779 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Sun, P., Zhang, W., Wang, H., Li, S., Li, X.: Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1407–1417 (2021)
Ulyanov, D., Vedaldi, A., Lempitsky, V.S.: Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)
Wu, Z., Su, L., Huang, Q.: Cascaded partial decoder for fast and accurate salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2019)
Yan, S., Song, X., Yu, C.: SDCNet: size divide and conquer network for salient object detection. In: Asian Conference on Computer Vision, pp. 637–653 (2020)
Yang, S., Lin, W., Lin, G., Jiang, Q., Liu, Z.: Progressive self-guided loss for salient object detection. IEEE Trans. Image Process. 30, 8426–8438 (2021)
Zhang, C., et al.: Cross-modality discrepant interaction network for RGB-D salient object detection. In: ACM Multimedia, pp. 2094–2102 (2021)
Zhang, J., et al.: UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8579–8588 (2020)
Zhang, J., et al.: RGB-D saliency detection via cascaded mutual information minimization. In: IEEE International Conference on Computer Vision, pp. 4318–4327 (2021)
Zhang, M., Fei, S.X., Liu, J., Xu, S., Piao, Y., Lu, H.: Asymmetric two-stream architecture for accurate RGB-D saliency detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 374–390. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_23
Zhao, X., Pang, Y., Zhang, L., Lu, H., Ruan, X.: Self-supervised pretraining for RGB-D salient object detection. In: Association for the Advancement of Artificial Intelligence, pp. 3463–3471 (2022)
Zhao, X., Zhang, L., Pang, Y., Lu, H., Zhang, L.: A single stream network for robust and real-time RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 646–662. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_39
Zhou, T., Fu, H., Chen, G., Zhou, Y., Fan, D., Shao, L.: Specificity-preserving RGB-D saliency detection. In: IEEE International Conference on Computer Vision, pp. 4661–4671 (2021)
Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: PDNet: prior-model guided depth-enhanced network for salient object detection. In: IEEE International Conference on Multimedia and Expo, pp. 199–204 (2019)
Zhu, J., Wu, J., Xu, Y., Chang, E.I., Tu, Z.: Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(4), 862–875 (2015)
Acknowledgements
This work is supported by the Ministry of Science and Technology of the People’s Republic of China no. 2018AAA0102003, National Natural Science Foundation of China under Grant no. 62006037, and the Fundamental Research Funds for the Central Universities no. DUT22JC06.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kong, Y., Zheng, Y., Yao, C., Liu, Y., Wang, H. (2023). Scale Adaptive Fusion Network for RGB-D Salient Object Detection. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13843. Springer, Cham. https://doi.org/10.1007/978-3-031-26313-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-26313-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26312-5
Online ISBN: 978-3-031-26313-2
eBook Packages: Computer ScienceComputer Science (R0)