MBDNet: Mitigating the “Under-Training Issue” in Dual-Encoder Model for RGB-d Salient Object Detection

Wang, Shuo; Yang, Gang; Zhang, Yunhua; Xu, Qiqi; Wang, Yutao

doi:10.1007/978-981-99-4761-4_9

Shuo Wang^13,14,
Gang Yang¹³,
Yunhua Zhang¹³,
Qiqi Xu¹³ &
…
Yutao Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14090))

Included in the following conference series:

International Conference on Intelligent Computing

Abstract

Existing RGB-D salient object detection methods generally rely on the dual-encoder structure for RGB and depth feature extraction. However, we observe that the encoders in such models are often not adequately trained to obtain superior feature representations. We name this problem the “under-training issue”. To this end, we propose a multi-branch decoding network (MBDNet) to suppress this issue. The MBDNet introduces additional decoding branches with supervision to form a multi-branch decoding (MBD) structure, facilitating the training of the encoders and enhancing the feature representation. Specifically, to ensure the effectiveness of the introduced supervision and improve the performance of additional decoding branches, we design an adaptive multi-scale decoding (AMSD) module. We also design a multi-branch feature aggregation (MBFA) module to aggregate the multi-branch features in MBD to further improve the detection accuracy. In addition, we design an enhancement complement fusion (ECF) module to achieve multi-modality feature fusion. Extensive experiments demonstrate that our MBDNet outperforms other state-of-the-art methods and mitigates the “under-training issue”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdulmunem, A., Lai, Y.-K., Sun, X.: Saliency guided local and global descriptors for effective action recognition. Computational Visual Media 2(1), 97–106 (2016). https://doi.org/10.1007/s41095-016-0033-9
Article Google Scholar
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 1597–1604. IEEE (2009)
Google Scholar
Bi, H., Wu, R., Liu, Z., Zhu, H., Zhang, C., Xiang, T.Z.: Cross-modal hierarchical interaction network for rgb-d salient object detection. Pattern Recogn. 136, 109194 (2023)
Article Google Scholar
Cadene, R., Dancette, C., Cord, M., Parikh, D., et al.: Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems 32 (2019)
Google Scholar
Chen, H., Li, Y.: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3051–3060 (2018)
Google Scholar
Cong, R., et al.: Cir-net: Cross-modality interaction and refinement for rgb-d salient object detection. IEEE Trans. Image Process. 31, 6800–6815 (2022)
Article Google Scholar
Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018)
Fan, D.P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.M.: Rethinking rgb-d salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems 32(5), 2075–2089 (2020)
Article Google Scholar
Fu, K., Fan, D.P., Ji, G.P., Zhao, Q., Shen, J., Zhu, C.: Siamese network for rgb-d salient object detection and beyond. IEEE transactions on pattern analysis and machine intelligence (2021)
Google Scholar
Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Google Scholar
Ji, W., et al.: Calibrated rgb-d salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9471–9481 (2021)
Google Scholar
Jin, W.D., Xu, J., Han, Q., Zhang, Y., Cheng, M.M.: Cdnet: Complementary depth network for rgb-d salient object detection. IEEE Trans. Image Process. 30, 3376–3390 (2021)
Article Google Scholar
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE international conference on image processing (ICIP), pp. 1115–1119. IEEE (2014)
Google Scholar
Li, C., Cong, R., Piao, Y., Xu, Q., Loy, C.C.: Rgb-d salient object detection with cross-modality modulation and selection. In: European Conference on Computer Vision, pp. 225–241. Springer (2020)
Google Scholar
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for rgb-d saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13756–13765 (2020)
Google Scholar
Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 248–255 (2014)
Google Scholar
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–461. IEEE (2012)
Google Scholar
Pang, Y., Zhang, L., Zhao, X., Lu, H.: Hierarchical dynamic filtering network for rgb-d salient object detection. In: European Conference on Computer Vision, pp. 235–252. Springer (2020)
Google Scholar
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgbd salient object detection: a benchmark and algorithms. In: European conference on computer vision, pp. 92–109. Springer (2014)
Google Scholar
Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: contrast based filtering for salient region detection. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 733–740. IEEE (2012)
Google Scholar
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7254–7263 (2019)
Google Scholar
Sun, P., Zhang, W., Wang, H., Li, S., Li, X.: Deep rgb-d saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1407–1417 (2021)
Google Scholar
Wang, F., Pan, J., Xu, S., Tang, J.: Learning discriminative cross-modality features for rgb-d saliency detection. IEEE Trans. Image Process. 31, 1285–1297 (2022)
Article Google Scholar
Wang, W., Tran, D., Feiszli, M.: What makes training multi-modal classification networks hard? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12695–12705 (2020)
Google Scholar
Wu, Z., Su, L., Huang, Q.: Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3907–3916 (2019)
Google Scholar
Yao, Z., Wang, L.: Erbanet: enhancing region and boundary awareness for salient object detection. Neurocomputing 448, 152–167 (2021)
Article Google Scholar
Zhai, Y., et al.: Bifurcated backbone strategy for rgb-d salient object detection. IEEE Trans. Image Process. 30, 8727–8742 (2021)
Article Google Scholar
Zhang, J., et al.: Uncertainty inspired rgb-d saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
Google Scholar
Zhang, W., Fu, K., Wang, Z., Ji, G.P., Zhao, Q.: Depth quality-inspired feature manipulation for efficient rgb-d and video salient object detection. arXiv preprint arXiv:2208.03918 (2022)
Zhou, B., Yang, G., Wan, X., Wang, Y., Liu, C., Wang, H.: A simple network with progressive structure for salient object detection. In: Chinese Conference on Pattern Recognition and Computer Vision (PRCV) (2021)
Google Scholar
Zhou, J., Wang, L., Lu, H., Huang, K., Shi, X., Liu, B.: Mvsalnet: Multi-view augmentation for rgb-d salient object detection. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX, pp. 270–287. Springer (2022)
Google Scholar

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China under Grant No. 62076058.

Author information

Authors and Affiliations

Northeastern University, Shenyang, 110819, China
Shuo Wang, Gang Yang, Yunhua Zhang, Qiqi Xu & Yutao Wang
DUT Artificial Intelligence Institute, Dalian, 116024, China
Shuo Wang

Authors

Shuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yunhua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiqi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yutao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Yang .

Editor information

Editors and Affiliations

Department of Computer Science, Eastern Institute of Technology, Zhejiang, China
De-Shuang Huang
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Zhengzhou University of Light Industry, Zhengzhou, China
Baohua Jin
Zhong Yuan University of Technology, Zhengzhou, China
Boyang Qu
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Department of Computer Science, Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Yang, G., Zhang, Y., Xu, Q., Wang, Y. (2023). MBDNet: Mitigating the “Under-Training Issue” in Dual-Encoder Model for RGB-d Salient Object Detection. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science(), vol 14090. Springer, Singapore. https://doi.org/10.1007/978-981-99-4761-4_9

Download citation

DOI: https://doi.org/10.1007/978-981-99-4761-4_9
Published: 31 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4760-7
Online ISBN: 978-981-99-4761-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics