Abstract
RGB-D salient object detection (SOD) aims to detect salient objects by fusing salient information in RGB and depth images. Although the cross-modal information fusion strategies employed by existing RGB-D SOD models can effectively fuse information from different modalities, most of them ignore the contextual information that is gradually diluted during feature fusion process, and they also lack the full exploitation and use of common and global information in bimodal images. In addition, although the decoding strategies adopted by existing RGB-D SOD models can effectively decode multi-level fused features, most of them do not fully mine the semantic information contained in the high-level fused features and the detail information contained in the low-level fused features, and they also do not fully utilize such information to steer the decoding, which in turn leads to the poor structure completeness and detail richness of generated saliency maps. To overcome the above-mentioned problems, we propose a feature complementary fusion and information-guided network (FCFIG-Net) for RGB-D SOD, which consists of a feature complementary fusion encoder and an information-guided decoder. In FCFIG-Net, the feature complementary fusion encoder and the information-guided decoder cooperate with each other to not only enhance and fuse the multi-modal features and the contextual information during the feature encoding process, but also fully utilize the semantic and detailed information in the features during the feature decoding process. Concretely, we first design a feature complementary enhancement module (FCEM), which enhances the representational capability of features from different modalities by utilizing the information complementarity among them. Then, in order to supplement the contextual information that is gradually diluted during the feature fusion process, we design a global contextual information extraction module (GCIEM) that extracts the global contextual information from deep encoded features. Furthermore, we design a multi-modal feature fusion module (MFFM), which achieves the sufficient fusion of bimodal features and global contextual information on the basis of fully mining and enhancing the common and global information contained in bimodal features. Using FCEM, GCIEM, MFFM, and RestNet50 backbone, we design a feature complementary fusion encoder. In addition, we also design a guidance decoding unit (GDU). Finally, Using GDU and an existing cascaded decoder, we design an information-guided decoder (IGD), which achieves high-quality step-by-step decoding of multi-level fused features based on fully utilizing the semantic information in the high-level fused features and the detail information in the low-level fused features. Extensive experiments on six widely used RGB-D datasets indicate that the performance of FCFIG-Net reaches current state-of-the-art.









Similar content being viewed by others
Data availibility
No datasets were generated or analysed during the current study.
References
Achanta, R., Hemami, S., Estrada, F., et al.: Frequency-tuned salient region detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 1597–1604 (2009)
Bi, H., Wu, R., Liu, Z., et al.: Cross-modal hierarchical interaction network for RGB-D salient object detection. Pattern Recogn. 136, 109194 (2023)
Chen, C., Wei, J., Peng, C., et al.: Depth-quality-aware salient object detection. IEEE Trans. Image Process. 30, 2350–2363 (2021)
Chen, H., Li, Y.: Three-stream attention-aware network for RGB-D salient object detection. IEEE Trans. Image Process. 28(6), 2825–2835 (2019)
Chen, Q., Liu, Z., Zhang, Y., et al.: RGB-D salient object detection via 3D convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1063–1071 (2021b)
Chen, S., Fu, Y.: Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection. In: Vedaldi, A. (ed.) European Conference on Computer Vision, pp. 520–538. Springer, Cham (2020)
Chen, T., Hu, X., Xiao, J., et al.: Cfidnet: cascaded feature interaction decoder for RGB-D salient object detection. Neural Comput. Appl. 34(10), 7547–7563 (2022)
Chen, T., Xiao, J., Hu, X., et al.: Adaptive fusion network for RGB-D salient object detection. Neurocomputing 522, 152–164 (2023)
Chen, Z., Cong, R., Xu, Q., et al.: Dpanet: depth potentiality-aware gated attention network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 7012–7024 (2020)
Cheng, Y., Fu, H., Wei, X., et al.: Depth enhanced saliency detection method. In: Proceedings of International Conference on Internet Multimedia Computing and Service. Association for Computing Machinery, New York, NY, USA, pp. 23–27 (2014a)
Cheng, Y., Fu, H., Wei, X., et al.: Depth enhanced saliency detection method. In: Proceedings of International Conference on Internet Multimedia Computing and Service. Association for Computing Machinery, New York, NY, USA, pp. 23–27 (2014b)
Cong, R., Lin, Q., Zhang, C., et al.: Cir-net: cross-modality interaction and refinement for RGB-D salient object detection. IEEE Trans. Image Process. 31, 6800–6815 (2022)
De Boer, P.T., Kroese, D.P., Mannor, S., et al.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134, 19–67 (2005)
Deng, J., Xu, D., Li, W., et al.: Harmonious teacher for cross-domain object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23829–23838 (2023)
Deng, J., Zhang, X., Li, W., et al.: Cross-domain detection transformer based on spatial-aware and semantic-aware token alignment. IEEE Trans. Multimed. 26, 5234–5245 (2023)
Deng, Z., Hu, X., Zhu, L., et al.: R3net: Recurrent residual refinement network for saliency detection. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, AAAI Press Menlo Park, CA, USA, pp. 684–690 (2018)
Dong, B., Zhou, Y., Hu, C., et al.: Bcnet: bidirectional collaboration network for edge-guided salient object detection. Neurocomputing 437, 58–71 (2021)
Everingham, M., Van Gool, L., Williams, C.K., et al.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)
Fan, D.P., Cheng, M.M., Liu, Y., et al.: Structure-measure: A new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4548–4557 (2017)
Fan, D.P., Gong, C., Cao, Y., et al.: Enhanced-alignment measure for binary foreground map evaluation. (2018) arXiv preprint arXiv:1805.10421
Fan, D.P., Lin, Z., Zhang, Z., et al.: Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learning Syst. 32(5), 2075–2089 (2020)
Fan, D.P., Zhai, Y., Borji, A., et al.: Bbs-net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network. In: Vedaldi, A. (ed.) European Conference on Computer Vision, pp. 275–292. Springer, Cham (2020)
Fu, K., Fan, D.P., Ji, G.P., et al.: Siamese network for RGB-D salient object detection and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5541–5559 (2021)
Gao, S.H., Cheng, M.M., Zhao, K., et al.: Res2Net50t: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019)
Hong, S., You, T., Kwak, S., et al.: Online tracking by learning discriminative saliency map with convolutional neural network. In: Bach F (ed) Proceedings of the 32nd International Conference on Machine Learning, vol 37. PMLR, Lille, France, pp. 597–606 (2015)
Hsu, C.C., Tsai, Y.H., Lin, Y.Y., et al.: Every pixel matters: Center-aware feature alignment for domain adaptive object detector. In: Proceedings of European Conference on Computer Vision, pp. 733–748 (2020)
Jiang, H., Wang, J., Yuan, Z., et al.: Salient object detection: a discriminative regional feature integration approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2083–2090 (2013)
Ju, R., Ge, L., Geng, W., et al.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 1115–1119 (2014)
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. (2014) arXiv preprint arXiv:1412.6980
Lei, B., Tan, E.L., Chen, S., et al.: Saliency-driven image classification method based on histogram mining and image score. Pattern Recogn. 48(8), 2567–2580 (2015)
Li, G., Liu, Z., Ye, L., et al.: Cross-Modal Weighting Network for RGB-D Salient Object Detection. In: Vedaldi, A. (ed.) European Conference on Computer Vision, pp. 665–681. Springer, Cham (2020)
Li, G., Liu, Z., Chen, M., et al.: Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)
Li, W., Liu, X., Yao, X., et al.: Scan: Cross domain object detection with semantic conditioned adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1421–1428 (2022a)
Li, W., Liu, X., Yuan, Y.: Sigma: Semantic-complete graph matching for domain adaptive object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5291–5300 (2022b)
Li, Z., Lang, C., Liew, J.H., et al.: Cross-layer feature pyramid network for salient object detection. IEEE Trans. Image Process. 30, 4587–4598 (2021)
Liu, J., Yuan, M., Huang, X., et al.: Diponet: dual-information progressive optimization network for salient object detection. Digit. Signal Process. 126, 103425 (2022)
Liu, J.J., Hou, Q., Cheng, M.M., et al.: A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3917–3926 (2019)
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for RGB-D saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13756–13765 (2020)
Liu, Z., Wang, Y., Tu, Z., et al.: Tritransnet: RGB-D salient object detection with a triplet transformer embedding network. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, pp. 4481–4490 (2021)
Lu, S., Tan, C., Lim, J.H.: Robust and efficient saliency modeling from image co-occurrence histograms. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 195–201 (2013)
Martinel, N., Micheloni, C., Foresti, G.L.: Kernelized saliency-based person re-identification through multiple metric learning. IEEE Trans. Image Process. 24(12), 5645–5658 (2015)
Niu, Y., Geng, Y., Li, X., et al.: Leveraging stereopsis for saliency analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–461 (2012)
O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458
Pang, Y., Zhang, L., Zhao, X., et al.: Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection. In: Computer Vision - ECCV 2020, pp. 235–252. Springer. Springer, Cham (2020)
Pang, Y., Zhao, X., Zhang, L., et al.: Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9413–9422 (2020b)
Peng, H., Li, B., Xiong, W., et al.: Rgbd salient object detection: A benchmark and algorithms. In: Fleet D (ed) Computer Vision—ECCV 2014. Springer, pp. 92–109 (2014)
Perazzi, F., Krähenbühl, P., Pritch, Y., et al.: Saliency filters: Contrast based filtering for salient region detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 733–740 (2012)
Piao, Y., Ji, W., Li, J., et al.: Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7254–7263 (2019)
Piao, Y., Rong, Z., Zhang, M., et al.: A2dele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9060–9069 (2020)
Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)
Sun, F., Ren, P., Yin, B., et al.: Catnet: A cascaded and aggregated transformer network for rgb-d salient object detection. IEEE Transactions on Multimedia pp 1–14 (2023)
Sun, G., Wang, W., Dai, J., et al.: Mining Cross-image Semantics for Weakly Supervised Semantic Segmentation. In: Vedaldi, A. (ed.) Computer Vision—ECCV 2020, vol. 12347, pp. 347–365. Springer, Cham (2020)
Sun, P., Zhang, W., Wang, H., et al.: Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1407–1417 (2021)
Tu, Z., Ma, Y., Li, C., et al.: Edge-guided non-local fully convolutional network for salient object detection. IEEE Trans. Circuits Syst. Video Technol. 31(2), 582–593 (2020)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wang, F., Pan, J., Xu, S., et al.: Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Trans. Image Process. 31, 1285–1297 (2022)
Wang, F., Su, Y., Wang, R., et al.: Cross-modal and cross-level attention interaction network for salient object detection. IEEE Trans. Artif. Intelli. 5(6), 2907–2920 (2024)
Wang, F., Wang, R., Sun, F.: Dcmnet: discriminant and cross-modality network for RGB-D salient object detection. Expert Syst. Appl. 214, 119047 (2023)
Wang, X., Li, S., Chen, C., et al.: Data-level recombination and lightweight fusion scheme for RGB-D salient object detection. IEEE Trans. Image Process. 30, 458–471 (2020)
Wang, Y., Jia, X., Zhang, L., et al.: A uniform transformer-based structure for feature fusion and enhancement for RGB-D saliency detection. Pattern Recogn. 140, 109516 (2023)
Yang, Y., Qin, Q., Luo, Y., et al.: Bi-directional progressive guidance network for RGB-D salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5346–5360 (2022)
Zeng, C., Kwong, S., Ip, H.: Dual swin-transformer based mutual interactive network for RGB-D salient object detection. Neurocomputing 559, 126779 (2023)
Zhang, C., Cong, R., Lin, Q., et al.: Cross-modality discrepant interaction network for RGB-D salient object detection. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, pp. 2094–2102 (2021a)
Zhang, J., Fan, D.P., Dai, Y., et al.: Uc-net: Uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8582–8591 (2020)
Zhang, M., Yao, S., Hu, B., et al.: \(\text{ C}^{2}\)dfnet: criss-cross dynamic filter network for RGB-D salient object detection. IEEE Trans. Multimed. 25, 5142–5154 (2022)
Zhang, W., Ji, G.P., Wang, Z., et al.: Depth quality-inspired feature manipulation for efficient rgb-d salient object detection. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, pp. 731–740 (2021b)
Zhang, Y., Wang, Z., Mao, Y.: Rpn prototype alignment for domain adaptive object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12425–12434 (2021c)
Zhang, Z., Lin, Z., Xu, J., et al.: Bilateral attention network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 1949–1961 (2021)
Zhao, L., Wang, L.: Task-specific inconsistency alignment for domain adaptive object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14217–14226 (2022)
Zhao, X., Zhang, L., Pang, Y., et al.: A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection. In: Vedaldi, A. (ed.) Computer Vision—ECCV 2020, pp. 646–662. Springer, Cham (2020)
Zhou, H., Xie, X., Lai, J.H., et al.: Interactive two-stream decoder for accurate and fast saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9141–9150 (2020)
Zhou, W., Du, D., Zhang, L., et al.: Multi-granularity alignment domain adaptation for object detection. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9581–9590 (2022)
Zhu, C., Li, G.: A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pp. 3008–3014 (2017)
Acknowledgements
This work is supported in part by the Science and Technology Development Plan Project of Henan Province, China (No. 222102110135)
Author information
Authors and Affiliations
Contributions
Haishun Du is responsible for methodology, writing, original draft preparation, reviewing, editing, supervision and funding acquisition. Kangyi Qiao is responsible for conceptualization, writing, original draft preparation, writing, reviewing and editing. Wenzhe Zhang is responsible for investigation, software and validation. Zhengyang Zhang is responsible for software, visualization and data curation. Sen Wang is responsible for software, visualization and data curation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Du, H., Qiao, K., Zhang, W. et al. FCFIG-Net: feature complementary fusion and information-guided network for RGB-D salient object detection. SIViP 18, 8547–8563 (2024). https://doi.org/10.1007/s11760-024-03489-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-024-03489-3