CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detection

Zhang, Yunhua; Wang, Hangxu; Yang, Gang; Zhang, Jianhao; Gong, Congjin; Wang, Yutao

doi:10.1007/s00371-023-02887-x

CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detection

Original article
Published: 23 May 2023

Volume 40, pages 1805–1823, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Yunhua Zhang^1,2^na1,
Hangxu Wang¹^na1,
Gang Yang ORCID: orcid.org/0000-0002-4873-9205¹,
Jianhao Zhang¹,
Congjin Gong¹ &
…
Yutao Wang¹

407 Accesses
2 Citations
Explore all metrics

Abstract

Global contexts are critical to locating salient objects for salient object detection (SOD). However, the convolution operation in CNNs has a local receptive field, which cannot capture long-distance global information. Recent studies have shown that modernized CNN models with large kernel convolution, such as ConvNeXt, can effectively extend the receptive fields. Based on it, this paper explores the potential of large kernel CNN for SOD task. Inspired by the common information between RGB and depth images in salient objects, we propose a ConvNeXt-based Siamese network with shared weight parameters. This structural design can effectively reduce the number of parameters without sacrificing performance. Furthermore, a depth information preprocessing module is proposed to minimize the impact of low-quality depth images on predicted saliency maps. For cross-modal feature interaction, a dynamic fusion module is designed to enhance cross-modal complementarity dynamically. Extensive experiments and evaluation results on six benchmark datasets demonstrate the outstanding performance of the proposed method against 14 state-of-the-art RGB-D methods. Our code will be released at https://github.com/zyh5119232/CSNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SL-Net: self-learning and mutual attention-based distinguished window for RGBD complex salient object detection

Article 17 September 2022

Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection

Article 02 March 2024

Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection

Data availability

Data will be made available on reasonable request.

References

Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. IEEE Conf. Comput. Vis. Pattern Recognit. 2009, 1597–1604 (2009)
Google Scholar
Cheng, M.-M., Hou, Q.-B., Zhang, S.-H., Rosin, P.L.: Intelligent visual media processing: when graphics meets vision. J. Comput. Sci. Technol. 32(1), 110–121 (2017)
Article Google Scholar
Liang, P., Pang, Y., Liao, C., Mei, X., Ling, H.: Adaptive objectness for object tracking. IEEE Signal Process. Lett. 23(7), 949–953 (2016)
Article ADS Google Scholar
Donoser, M., Urschler, M., Hirzer, M., Bischof, H.: Saliency driven total variation segmentation. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp. 817–824 (2009)
Wang, W., Shen, J., Sun, H., Shao, L.: Video co-saliency guided co-segmentation. IEEE Trans. Circuits Syst. Video Technol. 28(8), 1727–1736 (2017)
Article Google Scholar
Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3586–3593 (2013)
Zhu, J.-Y., Wu, J., Xu, Y., Chang, E., Tu, Z.: Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(4), 862–875 (2014)
Article Google Scholar
Zhao, J.-X., Liu, J., Fan, D.-P., Cao, Y., Yang, J., Cheng, M.-M.: Egnet: edge guidance network for salient object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp. 8778–8787 (2019)
Liu, J., Hou, Q., Cheng, M.-M., Feng, J., Jiang, J.: A simple pooling-based design for real-time salient object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 3912–3921 (2019)
Zhang, L., Wu, J., Wang, T., Borji, A., Wei, G., Lu, H.: A multistage refinement network for salient object detection. IEEE Trans. Image Process. 29, 3534–3545 (2020)
Article ADS Google Scholar
Sun, J., Yan, S., Song, X.: Qcnet: query context network for salient object detection of automatic surface inspection. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02597-w
Article Google Scholar
Yu Liu, Z., Wei Liu, J.: Hypergraph attentional convolutional neural network for salient object detection. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02499-x
Article Google Scholar
Wang, Y., Wang, H., Cao, J.: A contour self-compensated network for salient object detection. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-01882-w
Article Google Scholar
Chen, H., Li, Y.: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: IEEE/CVF conference on computer vision and pattern recognition, pp. 3051–3060 (2018)
Fan, D.-P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.-M.: Rethinking rgb-d salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32(5), 2075–2089 (2020)
Article Google Scholar
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for rgb-d saliency detection. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp. 13 753–13 762 (2020)
Pang Y, Zhang L, Zhao X, Lu H: Hierarchical dynamic filtering network for rgb-d salient object detection. In: European conference on computer vision. Springer, pp. 235–252 (2020)
Zhao, X., Zhang, L., Pang, Y., Lu, H., Zhang, L.: A single stream network for robust and real-time rgb-d salient object detection. In: European conference on computer vision. Springer, pp. 646–662 (2020)
Li, G., Liu, Z., Ling, H.: Icnet: information conversion network for rgb-d based salient object detection. IEEE Trans. Image Process. 29, 4873–4884 (2020)
Article ADS Google Scholar
Li, C., Cong, R., Piao, Y., Xu, Q., Loy, C. C.: Rgb-d salient object detection with cross-modality modulation and selection. In: European Conference on Computer Vision. Springer, pp. 225–241 (2020)
Gao, Y., Dai, M., Zhang, Q.: Cross-modal and multi-level feature refinement network for rgb-d salient object detection. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02543-w
Article PubMed PubMed Central Google Scholar
Wang, J., Chen, S., Lv, X., Xu, X., Hu, X.: Guided residual network for rgb-d salient object detection with efficient depth feature learning. Vis. Comput. 38, 1803–1814 (2022)
Article Google Scholar
Liu, Z., Duan, Q., Shi, S., Zhao, P.: Multi-level progressive parallel attention guided salient object detection for rgb-d images. Vis. Comput. 37, 529–540 (2020)
Article Google Scholar
Simonyan: Very deep convolutional networks for large-scale image recognition. (No Title), (2015)
Liu, Y., Zhang, X.-Y., Bian, J.-W., Zhang, L., Cheng, M.-M.: Samnet: stereoscopically attentive multi-scale network for lightweight salient object detection. IEEE Trans. Image Process. 30, 3804–3814 (2021)
Article PubMed ADS Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Wu, Y.-H., Liu, Y., Zhan, X., Cheng, M.-M.: P2t: pyramid pooling transformer for scene understanding. arXiv preprint arXiv:2106.12011 (2021)
Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pvt v2: improved baselines with pyramid vision transformer. Comput. Vis. Media 8(3), 415–424 (2022)
Article CAS Google Scholar
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10 012–10 022 (2021)
Liu, Z., Wang, Y., Tu, Z., Xiao, Y., Tang, B.: Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. In: Proceedings of the 29th ACM international conference on multimedia, pp. 4481–4490 (2021)
Liu, Z., Tan, Y., He, Q., Xiao, Y.: Swinnet: swin transformer drives edge-aware rgb-d and rgb-t salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4486–4497 (2021)
Article Google Scholar
Liu, N., Zhang, N., Wan, K., Shao, L., Han, J.: Visual saliency transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 4722–4732 (2021)
Liu, C., Yang, G., Wang, S., Wang, H., Zhang, Y., Wang, Y.: Tanet: transformer-based asymmetric network for rgb-d salient object detection. arXiv:2207.01172 (2022)
Zhang, N., Han, J., Liu, N.: Learning implicit class knowledge for rgb-d co-salient object detection with transformers. IEEE Trans. Image Process. 31, 4556–4570 (2022)
Article PubMed ADS Google Scholar
Douze, M., Touvron, H., Cord, M., Matthijs, D., Massa, F., Sablayrolles, A. Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International conference on machine learning (2020)
Ding, X., Zhang, X., Han, J., Ding, G.: Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11 963–11 975 (2022)
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11 976–11 986 (2022)
Liu, S., Chen, T., Chen, X., Chen, X., Xiao, Q., Wu, B., Pechenizkiy, M., Mocanu, D. C., Wang, Z.: More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. arXiv:2207.03620 (2022)
Guo, J., Han, K., Wu, H., Tang, Y., Chen, X., Wang, Y., Xu, C.: Cmt: convolutional neural networks meet vision transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12 175–12 185 (2022)
Liu, Z., Shi, S., Duan, Q., Zhang, W., Zhao, P.: Salient object detection for rgb-d image by single stream recurrent convolution neural network. Neurocomputing 363, 46–57 (2019)
Article Google Scholar
Fu, K., Fan, D.-P., Ji, G.-P., Zhao, Q., Shen, J., Zhu, C.: Siamese network for rgb-d salient object detection and beyond. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3073689
Article PubMed Google Scholar
Zhang, M., Ren, W., Piao, Y., Rong, Z., Lu, H.: Select, supplement and focus for rgb-d saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3472–3481 (2020)
Wang, N., Gong, X.: Adaptive fusion for rgb-d salient object detection. IEEE Access (2019). https://doi.org/10.1109/ACCESS.2019.2913107
Article PubMed PubMed Central Google Scholar
Niu, Y., Long, G., Liu, W., Guo, W., He, S.: Boundary-aware rgbd salient object detection with cross-modal feature sampling. IEEE Trans. Image Process. 29, 9496–9507 (2020)
Article ADS Google Scholar
Chen, Z., Cong, R., Xu, Q., Huang, Q.: Dpanet: depth potentiality-aware gated attention network for rgb-d salient object detection. IEEE Trans. Image Process. 30, 7012–7024 (2021)
Article PubMed ADS Google Scholar
Zhang, Z., Lin, Z., Xu, J., Jin, W., Lu, S.-P., Fan, D.-P.: Bilateral attention network for rgb-d salient object detection. IEEE Trans. Image Process. 30, 1949–1961 (2021)
Article PubMed ADS Google Scholar
Jin, W., Xu, J., Han, Q., Zhang, Y., Cheng, M.-M.: Cdnet: complementary depth network for rgb-d salient object detection. IEEE Trans. Image Process. 30, 3376–3390 (2021)
Article PubMed ADS Google Scholar
Peng, X., Wei, Y., Deng, A., Wang, D., Hu, D.: Balanced multimodal learning via on-the-fly gradient modulation (2022)
Du, C., Li, T., Liu, Y., Wen, Z., Hua, T., Wang, Y., Zhao, H.: Improving multi-modal learning with uni-modal teachers. arXiv preprint arXiv:2106.11059 (2021)
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 7254–7263 (2019)
Chen, S., Yu, J., Xu, X., Chen, Z., Lu, L., Hu, X., Yang, Y.: Split-guidance network for salient object detection. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02421-5
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2012)
Article Google Scholar
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters: improve semantic segmentation by global convolutional network. Comput. Vis. Pattern Recogn. (2017)
Han, Q., Fan, Z., Dai, Q., Sun, L., Cheng, M.-M., Liu, J., Wang, J. Demystifying local vision transformer: Sparse connectivity, weight sharing, and dynamic weight. arXiv:Computer Vision and Pattern Recognition (2021)
Hassanien, M.A., Singh, V.K., Puig, D., Abdel-Nasser, M.: Predicting breast tumor malignancy using deep convnext radiomics and quality-based score pooling in ultrasound sequences. Diagnostics 12(5), 1053 (2022)
Article PubMed PubMed Central Google Scholar
Zhang, H., Liu, C., Ho, J., Zhang, Z.: Crack detection based on convnext and normalization. J. Phys. Conf. Ser. 2289(1), 012022 (2022). (IOP Publishing)
Article Google Scholar
You, C., Hong, C., Liu, L., Lin, X.: Single image super-resolution using convnext. In: 2022 IEEE International conference on visual communications and image processing (VCIP). IEEE, pp. 1–5 (2022)
Ren, J., Gong, X., Yu, L., Zhou, W., Ying Yang, M.: Exploiting global priors for rgb-d saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 25–32 (2015)
Han, Q., Fan, Z., Dai, Q., Sun, L., Cheng, M.-M., Liu, J., Wang, J.: On the connection between local attention and dynamic depth-wise convolution. arXiv preprint arXiv:2106.04263 (2021)
Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., Han, X., Chen, Y.-W., Wu, J.: “Unet 3+: A full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 1055–1059 (2020)
Zhou, B., Yang, G., Wan, X., Wang, Y., Liu, C., Wang, H.: A simple network with progressive structure for salient object detection. In: Chinese conference on pattern recognition and computer vision (PRCV). Springer pp. 397–408 (2021)
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: IEEE international conference on image processing (ICIP). IEEE 2014, 1115–1119 (2014)
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgbd salient object detection: a benchmark and algorithms. In: European conference on computer vision. Springer, pp. 92–109 (2014)
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 7254–7263 (2019)
Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 2806–2813 (2014)
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp. 454–461 (2012)
Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)
Article MathSciNet PubMed ADS Google Scholar
Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 248–255 (2014)
Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: IEEE conference on computer vision and pattern recognition. 733–740 (2012)
Fan, D.-P., Cheng, M.-M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision, pp. 4548–4557 (2017)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Liu, W., Rabinovich, A., Berg, A. C.: Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)
Fan, D.-P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.-M.: Rethinking rgb-d salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32(5), 2075–2089 (2020)
Article Google Scholar
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for rgb-d saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13 756–13 765 (2020)
Piao, Y., Rong, Z., Zhang, M., Ren, W., Lu, H.: A2dele: Adaptive and attentive depth distiller for efficient rgb-d salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9060–9069 (2020)
Ji, W., Li, J., Yu, S., Zhang, M., Piao, Y., Yao, S., Bi, Q., Ma, K., Zheng, Y., Lu, H., et al.: Calibrated rgb-d salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9471–9481 (2021)
Jin, W.-D., Xu, J., Han, Q., Zhang, Y., Cheng, M.-M.: Cdnet: complementary depth network for rgb-d salient object detection. IEEE Trans. Image Process. 30, 3376–3390 (2021)
Article PubMed ADS Google Scholar
Chen, Q., Liu, Z., Zhang, Y., Fu, K., Zhao, Q., Du, H.: Rgb-d salient object detection via 3d convolutional neural networks. Proc. AAAI Conf. Artif. Intell. 35(2), 1063–1071 (2021)
Google Scholar
Sun, P., Zhang, W., Wang, H., Li, S., Li, X.: Deep rgb-d saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1407–1417 (2021)
Li, G., Liu, Z., Chen, M., Bai, Z., Lin, W., Ling, H.: Hierarchical alternate interaction network for rgb-d salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)
Article PubMed ADS Google Scholar
Pang, Y., Zhao, X., Zhang, L., Lu, H.: Caver: cross-modal view-mixed transformer for bi-modal salient object detection. IEEE Trans. Image Process. (2023). https://doi.org/10.1109/TIP.2023.3234702
Article PubMed Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant Number 62076058).

Author information

Yunhua Zhang and Hangxu Wang have contributed equally to this work.

Authors and Affiliations

Northeastern University, Shenyang, 110819, China
Yunhua Zhang, Hangxu Wang, Gang Yang, Jianhao Zhang, Congjin Gong & Yutao Wang
DUT Artificial Intelligence Institute, Dalian, 116024, China
Yunhua Zhang

Authors

Yunhua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hangxu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jianhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Congjin Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yutao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Wang, H., Yang, G. et al. CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detection. Vis Comput 40, 1805–1823 (2024). https://doi.org/10.1007/s00371-023-02887-x

Download citation

Accepted: 23 April 2023
Published: 23 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02887-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detection

Abstract

Access this article

Similar content being viewed by others

SL-Net: self-learning and mutual attention-based distinguished window for RGBD complex salient object detection

Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection

Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detection

Abstract

Access this article

Similar content being viewed by others

SL-Net: self-learning and mutual attention-based distinguished window for RGBD complex salient object detection

Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection

Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation