FCFIG-Net: feature complementary fusion and information-guided network for RGB-D salient object detection

Du, Haishun; Qiao, Kangyi; Zhang, Wenzhe; Zhang, Zhengyang; Wang, Sen

doi:10.1007/s11760-024-03489-3

FCFIG-Net: feature complementary fusion and information-guided network for RGB-D salient object detection

Original Paper
Published: 20 August 2024

Volume 18, pages 8547–8563, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Haishun Du^1,2,
Kangyi Qiao^1,2,
Wenzhe Zhang¹,
Zhengyang Zhang¹ &
…
Sen Wang¹

283 Accesses
Explore all metrics

Abstract

RGB-D salient object detection (SOD) aims to detect salient objects by fusing salient information in RGB and depth images. Although the cross-modal information fusion strategies employed by existing RGB-D SOD models can effectively fuse information from different modalities, most of them ignore the contextual information that is gradually diluted during feature fusion process, and they also lack the full exploitation and use of common and global information in bimodal images. In addition, although the decoding strategies adopted by existing RGB-D SOD models can effectively decode multi-level fused features, most of them do not fully mine the semantic information contained in the high-level fused features and the detail information contained in the low-level fused features, and they also do not fully utilize such information to steer the decoding, which in turn leads to the poor structure completeness and detail richness of generated saliency maps. To overcome the above-mentioned problems, we propose a feature complementary fusion and information-guided network (FCFIG-Net) for RGB-D SOD, which consists of a feature complementary fusion encoder and an information-guided decoder. In FCFIG-Net, the feature complementary fusion encoder and the information-guided decoder cooperate with each other to not only enhance and fuse the multi-modal features and the contextual information during the feature encoding process, but also fully utilize the semantic and detailed information in the features during the feature decoding process. Concretely, we first design a feature complementary enhancement module (FCEM), which enhances the representational capability of features from different modalities by utilizing the information complementarity among them. Then, in order to supplement the contextual information that is gradually diluted during the feature fusion process, we design a global contextual information extraction module (GCIEM) that extracts the global contextual information from deep encoded features. Furthermore, we design a multi-modal feature fusion module (MFFM), which achieves the sufficient fusion of bimodal features and global contextual information on the basis of fully mining and enhancing the common and global information contained in bimodal features. Using FCEM, GCIEM, MFFM, and RestNet50 backbone, we design a feature complementary fusion encoder. In addition, we also design a guidance decoding unit (GDU). Finally, Using GDU and an existing cascaded decoder, we design an information-guided decoder (IGD), which achieves high-quality step-by-step decoding of multi-level fused features based on fully utilizing the semantic information in the high-level fused features and the detail information in the low-level fused features. Extensive experiments on six widely used RGB-D datasets indicate that the performance of FCFIG-Net reaches current state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive guidance fusion network for RGB-D salient object detection

Article 30 November 2023

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

Article 02 March 2024

Multi-modality information refinement fusion network for RGB-D salient object detection

Article 21 September 2023

Data availibility

No datasets were generated or analysed during the current study.

References

Achanta, R., Hemami, S., Estrada, F., et al.: Frequency-tuned salient region detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 1597–1604 (2009)
Bi, H., Wu, R., Liu, Z., et al.: Cross-modal hierarchical interaction network for RGB-D salient object detection. Pattern Recogn. 136, 109194 (2023)
Article Google Scholar
Chen, C., Wei, J., Peng, C., et al.: Depth-quality-aware salient object detection. IEEE Trans. Image Process. 30, 2350–2363 (2021)
Article Google Scholar
Chen, H., Li, Y.: Three-stream attention-aware network for RGB-D salient object detection. IEEE Trans. Image Process. 28(6), 2825–2835 (2019)
Article MathSciNet Google Scholar
Chen, Q., Liu, Z., Zhang, Y., et al.: RGB-D salient object detection via 3D convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1063–1071 (2021b)
Chen, S., Fu, Y.: Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection. In: Vedaldi, A. (ed.) European Conference on Computer Vision, pp. 520–538. Springer, Cham (2020)
Google Scholar
Chen, T., Hu, X., Xiao, J., et al.: Cfidnet: cascaded feature interaction decoder for RGB-D salient object detection. Neural Comput. Appl. 34(10), 7547–7563 (2022)
Article Google Scholar
Chen, T., Xiao, J., Hu, X., et al.: Adaptive fusion network for RGB-D salient object detection. Neurocomputing 522, 152–164 (2023)
Article Google Scholar
Chen, Z., Cong, R., Xu, Q., et al.: Dpanet: depth potentiality-aware gated attention network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 7012–7024 (2020)
Article Google Scholar
Cheng, Y., Fu, H., Wei, X., et al.: Depth enhanced saliency detection method. In: Proceedings of International Conference on Internet Multimedia Computing and Service. Association for Computing Machinery, New York, NY, USA, pp. 23–27 (2014a)
Cheng, Y., Fu, H., Wei, X., et al.: Depth enhanced saliency detection method. In: Proceedings of International Conference on Internet Multimedia Computing and Service. Association for Computing Machinery, New York, NY, USA, pp. 23–27 (2014b)
Cong, R., Lin, Q., Zhang, C., et al.: Cir-net: cross-modality interaction and refinement for RGB-D salient object detection. IEEE Trans. Image Process. 31, 6800–6815 (2022)
Article Google Scholar
De Boer, P.T., Kroese, D.P., Mannor, S., et al.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134, 19–67 (2005)
Article MathSciNet Google Scholar
Deng, J., Xu, D., Li, W., et al.: Harmonious teacher for cross-domain object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23829–23838 (2023)
Deng, J., Zhang, X., Li, W., et al.: Cross-domain detection transformer based on spatial-aware and semantic-aware token alignment. IEEE Trans. Multimed. 26, 5234–5245 (2023)
Article Google Scholar
Deng, Z., Hu, X., Zhu, L., et al.: R3net: Recurrent residual refinement network for saliency detection. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, AAAI Press Menlo Park, CA, USA, pp. 684–690 (2018)
Dong, B., Zhou, Y., Hu, C., et al.: Bcnet: bidirectional collaboration network for edge-guided salient object detection. Neurocomputing 437, 58–71 (2021)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., et al.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)
Article Google Scholar
Fan, D.P., Cheng, M.M., Liu, Y., et al.: Structure-measure: A new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4548–4557 (2017)
Fan, D.P., Gong, C., Cao, Y., et al.: Enhanced-alignment measure for binary foreground map evaluation. (2018) arXiv preprint arXiv:1805.10421
Fan, D.P., Lin, Z., Zhang, Z., et al.: Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learning Syst. 32(5), 2075–2089 (2020)
Article Google Scholar
Fan, D.P., Zhai, Y., Borji, A., et al.: Bbs-net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network. In: Vedaldi, A. (ed.) European Conference on Computer Vision, pp. 275–292. Springer, Cham (2020)
Google Scholar
Fu, K., Fan, D.P., Ji, G.P., et al.: Siamese network for RGB-D salient object detection and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5541–5559 (2021)
Google Scholar
Gao, S.H., Cheng, M.M., Zhao, K., et al.: Res2Net50t: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019)
Article Google Scholar
Hong, S., You, T., Kwak, S., et al.: Online tracking by learning discriminative saliency map with convolutional neural network. In: Bach F (ed) Proceedings of the 32nd International Conference on Machine Learning, vol 37. PMLR, Lille, France, pp. 597–606 (2015)
Hsu, C.C., Tsai, Y.H., Lin, Y.Y., et al.: Every pixel matters: Center-aware feature alignment for domain adaptive object detector. In: Proceedings of European Conference on Computer Vision, pp. 733–748 (2020)
Jiang, H., Wang, J., Yuan, Z., et al.: Salient object detection: a discriminative regional feature integration approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2083–2090 (2013)
Ju, R., Ge, L., Geng, W., et al.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 1115–1119 (2014)
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. (2014) arXiv preprint arXiv:1412.6980
Lei, B., Tan, E.L., Chen, S., et al.: Saliency-driven image classification method based on histogram mining and image score. Pattern Recogn. 48(8), 2567–2580 (2015)
Article Google Scholar
Li, G., Liu, Z., Ye, L., et al.: Cross-Modal Weighting Network for RGB-D Salient Object Detection. In: Vedaldi, A. (ed.) European Conference on Computer Vision, pp. 665–681. Springer, Cham (2020)
Google Scholar
Li, G., Liu, Z., Chen, M., et al.: Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)
Article Google Scholar
Li, W., Liu, X., Yao, X., et al.: Scan: Cross domain object detection with semantic conditioned adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1421–1428 (2022a)
Li, W., Liu, X., Yuan, Y.: Sigma: Semantic-complete graph matching for domain adaptive object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5291–5300 (2022b)
Li, Z., Lang, C., Liew, J.H., et al.: Cross-layer feature pyramid network for salient object detection. IEEE Trans. Image Process. 30, 4587–4598 (2021)
Article Google Scholar
Liu, J., Yuan, M., Huang, X., et al.: Diponet: dual-information progressive optimization network for salient object detection. Digit. Signal Process. 126, 103425 (2022)
Article Google Scholar
Liu, J.J., Hou, Q., Cheng, M.M., et al.: A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3917–3926 (2019)
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for RGB-D saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13756–13765 (2020)
Liu, Z., Wang, Y., Tu, Z., et al.: Tritransnet: RGB-D salient object detection with a triplet transformer embedding network. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, pp. 4481–4490 (2021)
Lu, S., Tan, C., Lim, J.H.: Robust and efficient saliency modeling from image co-occurrence histograms. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 195–201 (2013)
Google Scholar
Martinel, N., Micheloni, C., Foresti, G.L.: Kernelized saliency-based person re-identification through multiple metric learning. IEEE Trans. Image Process. 24(12), 5645–5658 (2015)
Article MathSciNet Google Scholar
Niu, Y., Geng, Y., Li, X., et al.: Leveraging stereopsis for saliency analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–461 (2012)
O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458
Pang, Y., Zhang, L., Zhao, X., et al.: Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection. In: Computer Vision - ECCV 2020, pp. 235–252. Springer. Springer, Cham (2020)
Google Scholar
Pang, Y., Zhao, X., Zhang, L., et al.: Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9413–9422 (2020b)
Peng, H., Li, B., Xiong, W., et al.: Rgbd salient object detection: A benchmark and algorithms. In: Fleet D (ed) Computer Vision—ECCV 2014. Springer, pp. 92–109 (2014)
Perazzi, F., Krähenbühl, P., Pritch, Y., et al.: Saliency filters: Contrast based filtering for salient region detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 733–740 (2012)
Piao, Y., Ji, W., Li, J., et al.: Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7254–7263 (2019)
Piao, Y., Rong, Z., Zhang, M., et al.: A2dele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9060–9069 (2020)
Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)
Article MathSciNet Google Scholar
Sun, F., Ren, P., Yin, B., et al.: Catnet: A cascaded and aggregated transformer network for rgb-d salient object detection. IEEE Transactions on Multimedia pp 1–14 (2023)
Sun, G., Wang, W., Dai, J., et al.: Mining Cross-image Semantics for Weakly Supervised Semantic Segmentation. In: Vedaldi, A. (ed.) Computer Vision—ECCV 2020, vol. 12347, pp. 347–365. Springer, Cham (2020)
Chapter Google Scholar
Sun, P., Zhang, W., Wang, H., et al.: Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1407–1417 (2021)
Tu, Z., Ma, Y., Li, C., et al.: Edge-guided non-local fully convolutional network for salient object detection. IEEE Trans. Circuits Syst. Video Technol. 31(2), 582–593 (2020)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wang, F., Pan, J., Xu, S., et al.: Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Trans. Image Process. 31, 1285–1297 (2022)
Article Google Scholar
Wang, F., Su, Y., Wang, R., et al.: Cross-modal and cross-level attention interaction network for salient object detection. IEEE Trans. Artif. Intelli. 5(6), 2907–2920 (2024)
Article Google Scholar
Wang, F., Wang, R., Sun, F.: Dcmnet: discriminant and cross-modality network for RGB-D salient object detection. Expert Syst. Appl. 214, 119047 (2023)
Article Google Scholar
Wang, X., Li, S., Chen, C., et al.: Data-level recombination and lightweight fusion scheme for RGB-D salient object detection. IEEE Trans. Image Process. 30, 458–471 (2020)
Article Google Scholar
Wang, Y., Jia, X., Zhang, L., et al.: A uniform transformer-based structure for feature fusion and enhancement for RGB-D saliency detection. Pattern Recogn. 140, 109516 (2023)
Article Google Scholar
Yang, Y., Qin, Q., Luo, Y., et al.: Bi-directional progressive guidance network for RGB-D salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5346–5360 (2022)
Article Google Scholar
Zeng, C., Kwong, S., Ip, H.: Dual swin-transformer based mutual interactive network for RGB-D salient object detection. Neurocomputing 559, 126779 (2023)
Article Google Scholar
Zhang, C., Cong, R., Lin, Q., et al.: Cross-modality discrepant interaction network for RGB-D salient object detection. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, pp. 2094–2102 (2021a)
Zhang, J., Fan, D.P., Dai, Y., et al.: Uc-net: Uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8582–8591 (2020)
Zhang, M., Yao, S., Hu, B., et al.: $\text{ C}^{2}$dfnet: criss-cross dynamic filter network for RGB-D salient object detection. IEEE Trans. Multimed. 25, 5142–5154 (2022)
Article Google Scholar
Zhang, W., Ji, G.P., Wang, Z., et al.: Depth quality-inspired feature manipulation for efficient rgb-d salient object detection. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, pp. 731–740 (2021b)
Zhang, Y., Wang, Z., Mao, Y.: Rpn prototype alignment for domain adaptive object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12425–12434 (2021c)
Zhang, Z., Lin, Z., Xu, J., et al.: Bilateral attention network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 1949–1961 (2021)
Article Google Scholar
Zhao, L., Wang, L.: Task-specific inconsistency alignment for domain adaptive object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14217–14226 (2022)
Zhao, X., Zhang, L., Pang, Y., et al.: A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection. In: Vedaldi, A. (ed.) Computer Vision—ECCV 2020, pp. 646–662. Springer, Cham (2020)
Chapter Google Scholar
Zhou, H., Xie, X., Lai, J.H., et al.: Interactive two-stream decoder for accurate and fast saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9141–9150 (2020)
Zhou, W., Du, D., Zhang, L., et al.: Multi-granularity alignment domain adaptation for object detection. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9581–9590 (2022)
Zhu, C., Li, G.: A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pp. 3008–3014 (2017)

Download references

Acknowledgements

This work is supported in part by the Science and Technology Development Plan Project of Henan Province, China (No. 222102110135)

Author information

Authors and Affiliations

School of Artificial Intelligence, Henan University, Zhengzhou, 450046, China
Haishun Du, Kangyi Qiao, Wenzhe Zhang, Zhengyang Zhang & Sen Wang
International Joint Research Laboratory for Cooperative Vehicular Networks of Henan, Zhengzhou, 450046, China
Haishun Du & Kangyi Qiao

Authors

Haishun Du
View author publications
You can also search for this author inPubMed Google Scholar
Kangyi Qiao
View author publications
You can also search for this author inPubMed Google Scholar
Wenzhe Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhengyang Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Sen Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Haishun Du is responsible for methodology, writing, original draft preparation, reviewing, editing, supervision and funding acquisition. Kangyi Qiao is responsible for conceptualization, writing, original draft preparation, writing, reviewing and editing. Wenzhe Zhang is responsible for investigation, software and validation. Zhengyang Zhang is responsible for software, visualization and data curation. Sen Wang is responsible for software, visualization and data curation.

Corresponding author

Correspondence to Haishun Du.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Du, H., Qiao, K., Zhang, W. et al. FCFIG-Net: feature complementary fusion and information-guided network for RGB-D salient object detection. SIViP 18, 8547–8563 (2024). https://doi.org/10.1007/s11760-024-03489-3

Download citation

Received: 19 June 2024
Revised: 24 July 2024
Accepted: 31 July 2024
Published: 20 August 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s11760-024-03489-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FCFIG-Net: feature complementary fusion and information-guided network for RGB-D salient object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An adaptive guidance fusion network for RGB-D salient object detection

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

Multi-modality information refinement fusion network for RGB-D salient object detection

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now