Abstract
Existing RGB-D salient object detection methods generally rely on the dual-encoder structure for RGB and depth feature extraction. However, we observe that the encoders in such models are often not adequately trained to obtain superior feature representations. We name this problem the “under-training issue”. To this end, we propose a multi-branch decoding network (MBDNet) to suppress this issue. The MBDNet introduces additional decoding branches with supervision to form a multi-branch decoding (MBD) structure, facilitating the training of the encoders and enhancing the feature representation. Specifically, to ensure the effectiveness of the introduced supervision and improve the performance of additional decoding branches, we design an adaptive multi-scale decoding (AMSD) module. We also design a multi-branch feature aggregation (MBFA) module to aggregate the multi-branch features in MBD to further improve the detection accuracy. In addition, we design an enhancement complement fusion (ECF) module to achieve multi-modality feature fusion. Extensive experiments demonstrate that our MBDNet outperforms other state-of-the-art methods and mitigates the “under-training issue”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdulmunem, A., Lai, Y.-K., Sun, X.: Saliency guided local and global descriptors for effective action recognition. Computational Visual Media 2(1), 97–106 (2016). https://doi.org/10.1007/s41095-016-0033-9
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 1597–1604. IEEE (2009)
Bi, H., Wu, R., Liu, Z., Zhu, H., Zhang, C., Xiang, T.Z.: Cross-modal hierarchical interaction network for rgb-d salient object detection. Pattern Recogn. 136, 109194 (2023)
Cadene, R., Dancette, C., Cord, M., Parikh, D., et al.: Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems 32 (2019)
Chen, H., Li, Y.: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3051–3060 (2018)
Cong, R., et al.: Cir-net: Cross-modality interaction and refinement for rgb-d salient object detection. IEEE Trans. Image Process. 31, 6800–6815 (2022)
Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018)
Fan, D.P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.M.: Rethinking rgb-d salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems 32(5), 2075–2089 (2020)
Fu, K., Fan, D.P., Ji, G.P., Zhao, Q., Shen, J., Zhu, C.: Siamese network for rgb-d salient object detection and beyond. IEEE transactions on pattern analysis and machine intelligence (2021)
Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Ji, W., et al.: Calibrated rgb-d salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9471–9481 (2021)
Jin, W.D., Xu, J., Han, Q., Zhang, Y., Cheng, M.M.: Cdnet: Complementary depth network for rgb-d salient object detection. IEEE Trans. Image Process. 30, 3376–3390 (2021)
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE international conference on image processing (ICIP), pp. 1115–1119. IEEE (2014)
Li, C., Cong, R., Piao, Y., Xu, Q., Loy, C.C.: Rgb-d salient object detection with cross-modality modulation and selection. In: European Conference on Computer Vision, pp. 225–241. Springer (2020)
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for rgb-d saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13756–13765 (2020)
Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 248–255 (2014)
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–461. IEEE (2012)
Pang, Y., Zhang, L., Zhao, X., Lu, H.: Hierarchical dynamic filtering network for rgb-d salient object detection. In: European Conference on Computer Vision, pp. 235–252. Springer (2020)
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgbd salient object detection: a benchmark and algorithms. In: European conference on computer vision, pp. 92–109. Springer (2014)
Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: contrast based filtering for salient region detection. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 733–740. IEEE (2012)
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7254–7263 (2019)
Sun, P., Zhang, W., Wang, H., Li, S., Li, X.: Deep rgb-d saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1407–1417 (2021)
Wang, F., Pan, J., Xu, S., Tang, J.: Learning discriminative cross-modality features for rgb-d saliency detection. IEEE Trans. Image Process. 31, 1285–1297 (2022)
Wang, W., Tran, D., Feiszli, M.: What makes training multi-modal classification networks hard? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12695–12705 (2020)
Wu, Z., Su, L., Huang, Q.: Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3907–3916 (2019)
Yao, Z., Wang, L.: Erbanet: enhancing region and boundary awareness for salient object detection. Neurocomputing 448, 152–167 (2021)
Zhai, Y., et al.: Bifurcated backbone strategy for rgb-d salient object detection. IEEE Trans. Image Process. 30, 8727–8742 (2021)
Zhang, J., et al.: Uncertainty inspired rgb-d saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
Zhang, W., Fu, K., Wang, Z., Ji, G.P., Zhao, Q.: Depth quality-inspired feature manipulation for efficient rgb-d and video salient object detection. arXiv preprint arXiv:2208.03918 (2022)
Zhou, B., Yang, G., Wan, X., Wang, Y., Liu, C., Wang, H.: A simple network with progressive structure for salient object detection. In: Chinese Conference on Pattern Recognition and Computer Vision (PRCV) (2021)
Zhou, J., Wang, L., Lu, H., Huang, K., Shi, X., Liu, B.: Mvsalnet: Multi-view augmentation for rgb-d salient object detection. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX, pp. 270–287. Springer (2022)
Acknowledgment
This work is supported by the National Natural Science Foundation of China under Grant No. 62076058.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, S., Yang, G., Zhang, Y., Xu, Q., Wang, Y. (2023). MBDNet: Mitigating the “Under-Training Issue” in Dual-Encoder Model for RGB-d Salient Object Detection. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science(), vol 14090. Springer, Singapore. https://doi.org/10.1007/978-981-99-4761-4_9
Download citation
DOI: https://doi.org/10.1007/978-981-99-4761-4_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4760-7
Online ISBN: 978-981-99-4761-4
eBook Packages: Computer ScienceComputer Science (R0)