Cross-modal and multi-level feature refinement network for RGB-D salient object detection

Gao, Yue; Dai, Meng; Zhang, Qing

doi:10.1007/s00371-022-02543-w

Cross-modal and multi-level feature refinement network for RGB-D salient object detection

Original article
Published: 30 June 2022

Volume 39, pages 3979–3994, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Yue Gao¹,
Meng Dai¹ &
Qing Zhang¹

441 Accesses
7 Citations
Explore all metrics

Abstract

RGB-D salient object detection (SOD) methods adopt depth maps as important supplementary information in order to identify salient objects more accurately. However, there are still two main challenges in the existing RGB-D SOD methods. One typical issue is how to obtain effective cross-modal features, and another issue is how to optimize the integration of multi-level features. To tackle these two issues, we propose a novel cross-modal and multi-level feature refinement network which equips with a cross-modal feature interaction module and a multi-level feature fusion module. Specifically, a cross-modal feature interaction module is designed to enhance depth features from both channel and spatial perspectives and then effectively integrate cross-modal features. Moreover, considering the characteristics of different levels of features, we propose a multi-level feature fusion module which combines contextual information from multi-level features by means of skip connection. Extensive experiments on five benchmark datasets demonstrate that our proposed model outperforms other 17 state-of-the-art RGB-D SOD methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

SES-YOLOv8n: automatic driving object detection algorithm based on improved YOLOv8

Article 22 March 2024

References

Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: International Conference on Machine Learning, pp. 597–606 (2015). PMLR
Tsai, C.-C., Li, W., Hsu, K.-J., Qian, X., Lin, Y.-Y.: Image co-saliency detection and co-segmentation via progressive joint optimization. IEEE Trans. Image Process. 28(1), 56–71 (2018)
Article MathSciNet MATH Google Scholar
Fan, D.-P., Wang, W., Cheng, M.-M., Shen, J.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554–8564 (2019)
Craye, C., Filliat, D., Goudou, J.-F.: Environment exploration for object-based visual saliency learning. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2303–2309 (2016). IEEE
Liu, G., Fan, D.: A model of visual attention for natural image retrieval. In: 2013 International Conference on Information Science and Cloud Computing Companion, pp. 728–733 (2013). IEEE
Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., Jagersand, M.: Basnet: Boundary-aware salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7479–7489 (2019)
Fan, D.-P., Zhai, Y., Borji, A., Yang, J., Shao, L.: Bbs-net: Rgb-d salient object detection with a bifurcated backbone strategy network. In: European Conference on Computer Vision, pp. 275–292 (2020). Springer
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Zhang, X., Wang, T., Qi, J., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 714–722 (2018)
Zhang, J., Yu, X., Li, A., Song, P., Liu, B., Dai, Y.: Weakly-supervised salient object detection via scribble annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12546–12555 (2020)
Pang, Y., Zhao, X., Zhang, L., Lu, H.: Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9413–9422 (2020)
Cong, R., Lei, J., Zhang, C., Huang, Q., Cao, X., Hou, C.: Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Process. Lett. 23(6), 819–823 (2016)
Article Google Scholar
Liu, Z., Duan, Q., Shi, S., Zhao, P.: Multi-level progressive parallel attention guided salient object detection for RGB-D images. Vis. Comput. 37(3), 529–540 (2021)
Article Google Scholar
Han, J., Chen, H., Liu, N., Yan, C., Li, X.: Cnns-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans. Cybern. 48(11), 3171–3183 (2017)
Article Google Scholar
Qu, L., He, S., Zhang, J., Tian, J., Tang, Y., Yang, Q.: RGBD salient object detection via deep fusion. IEEE Trans. Image Process. 26(5), 2274–2285 (2017)
Article MathSciNet MATH Google Scholar
Chen, H., Li, Y.: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3051–3060 (2018)
Chen, H., Li, Y., Su, D.: Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recogn. 86, 376–385 (2019)
Article Google Scholar
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7254–7263 (2019)
Zhu, C., Li, G., Wang, W., Wang, R.: An innovative salient object detection using center-dark channel prior. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1509–1515 (2017)
Chen, H., Li, Y.: Three-stream attention-aware network for RGB-D salient object detection. IEEE Trans. Image Process. 28(6), 2825–2835 (2019)
Article MathSciNet MATH Google Scholar
Fan, D.-P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.-M.: Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32(5), 2075–2089 (2020)
Article Google Scholar
Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: Pdnet: Prior-model guided depth-enhanced network for salient object detection. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 199–204 (2019). IEEE
Zhao, J.-X., Cao, Y., Fan, D.-P., Cheng, M.-M., Li, X.-Y., Zhang, L.: Contrast prior and fluid pyramid integration for RGBD salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3927–3936 (2019)
Liu, Z., Shi, S., Duan, Q., Zhang, W., Zhao, P.: Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363, 46–57 (2019)
Article Google Scholar
Li, G., Liu, Z., Chen, M., Bai, Z., Lin, W., Ling, H.: Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)
Article Google Scholar
Zhang, H., Lei, J., Fan, X., Wu, M., Zhang, P., Bu, S.: Depth combined saliency detection based on region contrast model. In: 2012 7th International Conference on Computer Science & Education (ICCSE), pp. 763–766 (2012). IEEE
Desingh, K., Krishna, K.M., Rajan, D., Jawahar, C.: Depth really matters: Improving visual salient region detection with depth. In: BMVC, pp. 1–11 (2013)
Ciptadi, A., Hermans, T., Rehg, J.M.: An in depth view of saliency. (2013). Georgia Institute of Technology
Song, H., Liu, Z., Du, H., Sun, G., Le Meur, O., Ren, T.: Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Trans. Image Process. 26(9), 4204–4216 (2017)
Article MathSciNet MATH Google Scholar
Ren, J., Gong, X., Yu, L., Zhou, W., Ying Yang, M.: Exploiting global priors for rgb-d saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 25–32 (2015)
Shigematsu, R., Feng, D., You, S., Barnes, N.: Learning rgb-d salient object detection using background enclosure, depth contrast, and top-down features. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2749–2757 (2017)
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgbd salient object detection: A benchmark and algorithms. In: European Conference on Computer Vision, pp. 92–109 (2014). Springer
Fan, X., Liu, Z., Sun, G.: Salient region detection for stereoscopic images. In: 2014 19th International Conference on Digital Signal Processing, pp. 454–458 (2014). IEEE
Fang, Y., Wang, J., Narwaria, M., Le Callet, P., Lin, W.: Saliency detection for stereoscopic images. IEEE Trans. Image Process. 23(6), 2625–2636 (2014)
Article MathSciNet MATH Google Scholar
Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2343–2350 (2016)
Debelee, T.G., Gebreselasie, A., Schwenker, F., Amirian, M., Yohannes, D.: Classification of mammograms using texture and CNN based extracted features. In: Journal of Biomimetics, Biomaterials and Biomedical Engineering, vol. 42, pp. 79–97 (2019). Trans Tech Publ
Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis. Comput. 36(2), 405–412 (2020)
Article Google Scholar
Li, X., Huang, H., Zhao, H., Wang, Y., Hu, M.: Learning a convolutional neural network for propagation-based stereo image segmentation. Vis. Comput. 36(1), 39–52 (2020)
Article Google Scholar
Fu, K., Fan, D.-P., Ji, G.-P., Zhao, Q.: Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3052–3062 (2020)
Mozaffari, M.H., Lee, W.-S.: Semantic segmentation with peripheral vision. In: International Symposium on Visual Computing, pp. 421–429 (2020). Springer
Zhang, M., Ren, W., Piao, Y., Rong, Z., Lu, H.: Select, supplement and focus for rgb-d saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3472–3481 (2020)
Luo, A., Li, X., Yang, F., Jiao, Z., Cheng, H., Lyu, S.: Cascade graph neural networks for rgb-d salient object detection. In: European Conference on Computer Vision, pp. 346–364 (2020). Springer
Ji, W., Li, J., Zhang, M., Piao, Y., Lu, H.: Accurate rgb-d salient object detection via collaborative learning. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp. 52–69 (2020). Springer
Zhao, X., Zhang, L., Pang, Y., Lu, H., Zhang, L.: A single stream network for robust and real-time rgb-d salient object detection. In: European Conference on Computer Vision, pp. 646–662 (2020). Springer
Jiang, B., Zhou, Z., Wang, X., Tang, J., Luo, B.: Cmsalgan: RGB-D salient object detection with cross-view generative adversarial networks. IEEE Trans. Multimed. 23, 1343–1353 (2020)
Article Google Scholar
Zhang, Z., Lin, Z., Xu, J., Jin, W.-D., Lu, S.-P., Fan, D.-P.: Bilateral attention network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 1949–1961 (2021)
Article Google Scholar
Pang, Y., Zhang, L., Zhao, X., Lu, H.: Hierarchical dynamic filtering network for rgb-d salient object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp. 235–252 (2020). Springer
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for RGB-D saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13756–13765 (2020)
Zhao, J., Zhao, Y., Li, J., Chen, X.: Is depth really necessary for salient object detection? In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1745–1754 (2020)
Chen, Z., Cong, R., Xu, Q., Huang, Q.: Dpanet: Depth potentiality-aware gated attention network for RGB-D salient object detection. IEEE Trans. Image Process. (2020)
Chen, S., Fu, Y.: Progressively guided alternate refinement network for RGB-D salient object detection. In: European Conference on Computer Vision, pp. 520–538 (2020). Springer
Chen, C., Wei, J., Peng, C., Qin, H.: Depth-quality-aware salient object detection. IEEE Trans. Image Process. 30, 2350–2363 (2021)
Article Google Scholar
Ji, W., Li, J., Yu, S., Zhang, M., Piao, Y., Yao, S., Bi, Q., Ma, K., Zheng, Y., Lu, H., et al.: Calibrated RGB-D salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9471–9481 (2021)
Zhang, W., Ji, G.-P., Wang, Z., Fu, K., Zhao, Q.: Depth quality-inspired feature manipulation for efficient RGB-D salient object detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 731–740 (2021)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
De Boer, P.-T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)
Article MathSciNet MATH Google Scholar
Máttyus, G., Luo, W., Urtasun, R.: Deeproadmapper: Extracting road topology from aerial images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3438–3446 (2017)
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1115–1119 (2014). IEEE
Cheng, Y., Fu, H., Wei, X., Xiao, J., Cao, X.: Depth enhanced saliency detection method. In: Proceedings of International Conference on Internet Multimedia Computing and Service, pp. 23–27 (2014)
Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2806–2813 (2014)
Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 733–740 (2012). IEEE
Achanta, R., Hemami, S., Estrada, F.J., Susstrunk, S.: Frequency-tuned salient region detection. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1597–1604 (2009)
Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2014)
Fan, D., Cheng, M., Liu, Y., Li, T., Botji, A.: Structure-measure: A new way to evaluate foreground maps. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4548–4557 (2017)
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604 (2009). IEEE
Tu, Z., Li, Z., Li, C., Lang, Y., Tang, J.: Multi-interactive dual-decoder for rgb-thermal salient object detection. IEEE Trans. Image Process. (2021)

Download references

Acknowledgements

This work is supported by Natural Science Foundation of Shanghai under Grant Nos. 19ZR1455300 and 21ZR1462600, and National Natural Science Foundation of China Under Grant No. 61806126.

Author information

Authors and Affiliations

School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai, China
Yue Gao, Meng Dai & Qing Zhang

Authors

Yue Gao
View author publications
You can also search for this author in PubMed Google Scholar
Meng Dai
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Dai.

Ethics declarations

Conflict of interest

We declare that we have no competing financial interests or personal relationships that could have appeared to influence our work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, Y., Dai, M. & Zhang, Q. Cross-modal and multi-level feature refinement network for RGB-D salient object detection. Vis Comput 39, 3979–3994 (2023). https://doi.org/10.1007/s00371-022-02543-w

Download citation

Accepted: 18 May 2022
Published: 30 June 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00371-022-02543-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-modal and multi-level feature refinement network for RGB-D salient object detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

SES-YOLOv8n: automatic driving object detection algorithm based on improved YOLOv8

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-modal and multi-level feature refinement network for RGB-D salient object detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

SES-YOLOv8n: automatic driving object detection algorithm based on improved YOLOv8

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation