Skip to main content
Log in

Cross-modal and multi-level feature refinement network for RGB-D salient object detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

RGB-D salient object detection (SOD) methods adopt depth maps as important supplementary information in order to identify salient objects more accurately. However, there are still two main challenges in the existing RGB-D SOD methods. One typical issue is how to obtain effective cross-modal features, and another issue is how to optimize the integration of multi-level features. To tackle these two issues, we propose a novel cross-modal and multi-level feature refinement network which equips with a cross-modal feature interaction module and a multi-level feature fusion module. Specifically, a cross-modal feature interaction module is designed to enhance depth features from both channel and spatial perspectives and then effectively integrate cross-modal features. Moreover, considering the characteristics of different levels of features, we propose a multi-level feature fusion module which combines contextual information from multi-level features by means of skip connection. Extensive experiments on five benchmark datasets demonstrate that our proposed model outperforms other 17 state-of-the-art RGB-D SOD methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: International Conference on Machine Learning, pp. 597–606 (2015). PMLR

  2. Tsai, C.-C., Li, W., Hsu, K.-J., Qian, X., Lin, Y.-Y.: Image co-saliency detection and co-segmentation via progressive joint optimization. IEEE Trans. Image Process. 28(1), 56–71 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  3. Fan, D.-P., Wang, W., Cheng, M.-M., Shen, J.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554–8564 (2019)

  4. Craye, C., Filliat, D., Goudou, J.-F.: Environment exploration for object-based visual saliency learning. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2303–2309 (2016). IEEE

  5. Liu, G., Fan, D.: A model of visual attention for natural image retrieval. In: 2013 International Conference on Information Science and Cloud Computing Companion, pp. 728–733 (2013). IEEE

  6. Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., Jagersand, M.: Basnet: Boundary-aware salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7479–7489 (2019)

  7. Fan, D.-P., Zhai, Y., Borji, A., Yang, J., Shao, L.: Bbs-net: Rgb-d salient object detection with a bifurcated backbone strategy network. In: European Conference on Computer Vision, pp. 275–292 (2020). Springer

  8. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  9. Zhang, X., Wang, T., Qi, J., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 714–722 (2018)

  10. Zhang, J., Yu, X., Li, A., Song, P., Liu, B., Dai, Y.: Weakly-supervised salient object detection via scribble annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12546–12555 (2020)

  11. Pang, Y., Zhao, X., Zhang, L., Lu, H.: Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9413–9422 (2020)

  12. Cong, R., Lei, J., Zhang, C., Huang, Q., Cao, X., Hou, C.: Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Process. Lett. 23(6), 819–823 (2016)

    Article  Google Scholar 

  13. Liu, Z., Duan, Q., Shi, S., Zhao, P.: Multi-level progressive parallel attention guided salient object detection for RGB-D images. Vis. Comput. 37(3), 529–540 (2021)

    Article  Google Scholar 

  14. Han, J., Chen, H., Liu, N., Yan, C., Li, X.: Cnns-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans. Cybern. 48(11), 3171–3183 (2017)

    Article  Google Scholar 

  15. Qu, L., He, S., Zhang, J., Tian, J., Tang, Y., Yang, Q.: RGBD salient object detection via deep fusion. IEEE Trans. Image Process. 26(5), 2274–2285 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  16. Chen, H., Li, Y.: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3051–3060 (2018)

  17. Chen, H., Li, Y., Su, D.: Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recogn. 86, 376–385 (2019)

    Article  Google Scholar 

  18. Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7254–7263 (2019)

  19. Zhu, C., Li, G., Wang, W., Wang, R.: An innovative salient object detection using center-dark channel prior. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1509–1515 (2017)

  20. Chen, H., Li, Y.: Three-stream attention-aware network for RGB-D salient object detection. IEEE Trans. Image Process. 28(6), 2825–2835 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  21. Fan, D.-P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.-M.: Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32(5), 2075–2089 (2020)

    Article  Google Scholar 

  22. Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: Pdnet: Prior-model guided depth-enhanced network for salient object detection. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 199–204 (2019). IEEE

  23. Zhao, J.-X., Cao, Y., Fan, D.-P., Cheng, M.-M., Li, X.-Y., Zhang, L.: Contrast prior and fluid pyramid integration for RGBD salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3927–3936 (2019)

  24. Liu, Z., Shi, S., Duan, Q., Zhang, W., Zhao, P.: Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363, 46–57 (2019)

    Article  Google Scholar 

  25. Li, G., Liu, Z., Chen, M., Bai, Z., Lin, W., Ling, H.: Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)

    Article  Google Scholar 

  26. Zhang, H., Lei, J., Fan, X., Wu, M., Zhang, P., Bu, S.: Depth combined saliency detection based on region contrast model. In: 2012 7th International Conference on Computer Science & Education (ICCSE), pp. 763–766 (2012). IEEE

  27. Desingh, K., Krishna, K.M., Rajan, D., Jawahar, C.: Depth really matters: Improving visual salient region detection with depth. In: BMVC, pp. 1–11 (2013)

  28. Ciptadi, A., Hermans, T., Rehg, J.M.: An in depth view of saliency. (2013). Georgia Institute of Technology

  29. Song, H., Liu, Z., Du, H., Sun, G., Le Meur, O., Ren, T.: Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Trans. Image Process. 26(9), 4204–4216 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  30. Ren, J., Gong, X., Yu, L., Zhou, W., Ying Yang, M.: Exploiting global priors for rgb-d saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 25–32 (2015)

  31. Shigematsu, R., Feng, D., You, S., Barnes, N.: Learning rgb-d salient object detection using background enclosure, depth contrast, and top-down features. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2749–2757 (2017)

  32. Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgbd salient object detection: A benchmark and algorithms. In: European Conference on Computer Vision, pp. 92–109 (2014). Springer

  33. Fan, X., Liu, Z., Sun, G.: Salient region detection for stereoscopic images. In: 2014 19th International Conference on Digital Signal Processing, pp. 454–458 (2014). IEEE

  34. Fang, Y., Wang, J., Narwaria, M., Le Callet, P., Lin, W.: Saliency detection for stereoscopic images. IEEE Trans. Image Process. 23(6), 2625–2636 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  35. Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2343–2350 (2016)

  36. Debelee, T.G., Gebreselasie, A., Schwenker, F., Amirian, M., Yohannes, D.: Classification of mammograms using texture and CNN based extracted features. In: Journal of Biomimetics, Biomaterials and Biomedical Engineering, vol. 42, pp. 79–97 (2019). Trans Tech Publ

  37. Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis. Comput. 36(2), 405–412 (2020)

    Article  Google Scholar 

  38. Li, X., Huang, H., Zhao, H., Wang, Y., Hu, M.: Learning a convolutional neural network for propagation-based stereo image segmentation. Vis. Comput. 36(1), 39–52 (2020)

    Article  Google Scholar 

  39. Fu, K., Fan, D.-P., Ji, G.-P., Zhao, Q.: Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3052–3062 (2020)

  40. Mozaffari, M.H., Lee, W.-S.: Semantic segmentation with peripheral vision. In: International Symposium on Visual Computing, pp. 421–429 (2020). Springer

  41. Zhang, M., Ren, W., Piao, Y., Rong, Z., Lu, H.: Select, supplement and focus for rgb-d saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3472–3481 (2020)

  42. Luo, A., Li, X., Yang, F., Jiao, Z., Cheng, H., Lyu, S.: Cascade graph neural networks for rgb-d salient object detection. In: European Conference on Computer Vision, pp. 346–364 (2020). Springer

  43. Ji, W., Li, J., Zhang, M., Piao, Y., Lu, H.: Accurate rgb-d salient object detection via collaborative learning. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp. 52–69 (2020). Springer

  44. Zhao, X., Zhang, L., Pang, Y., Lu, H., Zhang, L.: A single stream network for robust and real-time rgb-d salient object detection. In: European Conference on Computer Vision, pp. 646–662 (2020). Springer

  45. Jiang, B., Zhou, Z., Wang, X., Tang, J., Luo, B.: Cmsalgan: RGB-D salient object detection with cross-view generative adversarial networks. IEEE Trans. Multimed. 23, 1343–1353 (2020)

    Article  Google Scholar 

  46. Zhang, Z., Lin, Z., Xu, J., Jin, W.-D., Lu, S.-P., Fan, D.-P.: Bilateral attention network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 1949–1961 (2021)

    Article  Google Scholar 

  47. Pang, Y., Zhang, L., Zhao, X., Lu, H.: Hierarchical dynamic filtering network for rgb-d salient object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp. 235–252 (2020). Springer

  48. Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for RGB-D saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13756–13765 (2020)

  49. Zhao, J., Zhao, Y., Li, J., Chen, X.: Is depth really necessary for salient object detection? In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1745–1754 (2020)

  50. Chen, Z., Cong, R., Xu, Q., Huang, Q.: Dpanet: Depth potentiality-aware gated attention network for RGB-D salient object detection. IEEE Trans. Image Process. (2020)

  51. Chen, S., Fu, Y.: Progressively guided alternate refinement network for RGB-D salient object detection. In: European Conference on Computer Vision, pp. 520–538 (2020). Springer

  52. Chen, C., Wei, J., Peng, C., Qin, H.: Depth-quality-aware salient object detection. IEEE Trans. Image Process. 30, 2350–2363 (2021)

    Article  Google Scholar 

  53. Ji, W., Li, J., Yu, S., Zhang, M., Piao, Y., Yao, S., Bi, Q., Ma, K., Zheng, Y., Lu, H., et al.: Calibrated RGB-D salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9471–9481 (2021)

  54. Zhang, W., Ji, G.-P., Wang, Z., Fu, K., Zhao, Q.: Depth quality-inspired feature manipulation for efficient RGB-D salient object detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 731–740 (2021)

  55. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  56. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  57. De Boer, P.-T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  58. Máttyus, G., Luo, W., Urtasun, R.: Deeproadmapper: Extracting road topology from aerial images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3438–3446 (2017)

  59. Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1115–1119 (2014). IEEE

  60. Cheng, Y., Fu, H., Wei, X., Xiao, J., Cao, X.: Depth enhanced saliency detection method. In: Proceedings of International Conference on Internet Multimedia Computing and Service, pp. 23–27 (2014)

  61. Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2806–2813 (2014)

  62. Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 733–740 (2012). IEEE

  63. Achanta, R., Hemami, S., Estrada, F.J., Susstrunk, S.: Frequency-tuned salient region detection. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1597–1604 (2009)

  64. Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2014)

  65. Fan, D., Cheng, M., Liu, Y., Li, T., Botji, A.: Structure-measure: A new way to evaluate foreground maps. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4548–4557 (2017)

  66. Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604 (2009). IEEE

  67. Tu, Z., Li, Z., Li, C., Lang, Y., Tang, J.: Multi-interactive dual-decoder for rgb-thermal salient object detection. IEEE Trans. Image Process. (2021)

Download references

Acknowledgements

This work is supported by Natural Science Foundation of Shanghai under Grant Nos. 19ZR1455300 and 21ZR1462600, and National Natural Science Foundation of China Under Grant No. 61806126.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meng Dai.

Ethics declarations

Conflict of interest

We declare that we have no competing financial interests or personal relationships that could have appeared to influence our work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Y., Dai, M. & Zhang, Q. Cross-modal and multi-level feature refinement network for RGB-D salient object detection. Vis Comput 39, 3979–3994 (2023). https://doi.org/10.1007/s00371-022-02543-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02543-w

Keywords

Navigation