Bi-directional Interaction and Dense Aggregation Network for RGB-D Salient Object Detection

Yi, Kang; Tang, Haoran; Bai, Hongyu; Wang, Yinjie; Xu, Jing; Li, Ping

doi:10.1007/978-3-031-53305-1_36

Kang Yi¹⁴,
Haoran Tang¹⁵,
Hongyu Bai¹⁴,
Yinjie Wang¹⁴,
Jing Xu¹⁴ &
…
Ping Li¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14554))

Included in the following conference series:

International Conference on Multimedia Modeling

344 Accesses

Abstract

RGB-D salient object detection (SOD) which aims to detect the prominent regions in figures has attracted much attention recently. It jointly models the RGB and depth information. However, existing methods explore cross-modality information from RGB images and depth maps without considering the potential coupling correlation between them. This may lead to insufficient information learning of these two modalities and even bring conflict due to their de-coupled representations. Thus, in this paper, we propose a novel framework called Bi-directional Interaction and Dense Aggregation Network (BIDANet) for RGB-D salient object detection. Firstly, we carefully design the depth-guided enhancement (DGE) and RGB-induced style transfer (RST) to allow the depth map and RGB image to learn information from each other through the bi-directional interaction network. Secondly, we adopt an adaptive cross-modal fusion (ACF) to flexibly integrate these learned multi-modal features. Last, we propose a dense aggregation network (DAN) to effectively aggregate cross-stage outcomes and generate accurate saliency prediction. Extensive experiments on 5 widely-used datasets demonstrate that our proposed BIDANet achieves superior performance compared with 14 state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604 (2009)
Google Scholar
Bi, H., Wu, R., Liu, Z., Zhu, H., Zhang, C., Xiang, T.Z.: Cross-modal hierarchical interaction network for RGB-D salient object detection. Pattern Recogn. 136, 109194 (2023)
Article Google Scholar
Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)
Article MathSciNet Google Scholar
Chen, C., Wei, J., Peng, C., Qin, H.: Depth-quality-aware salient object detection. IEEE Trans. Image Process. 30, 2350–2363 (2021)
Article Google Scholar
Chen, G., et al.: Modality-induced transfer-fusion network for RGB-D and RGB-T salient object detection. IEEE Trans. Circ. Syst. Video Technol. 33(4), 1787–1801 (2023)
Article Google Scholar
Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: Proceedings of IEEE International Conference on Computer Vision, pp. 4558–4567 (2017)
Google Scholar
Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 698–704 (2018)
Google Scholar
Fan, D.P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.M.: Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32(5), 2075–2089 (2021)
Article Google Scholar
Fang, X., Jiang, M., Zhu, J., Shao, X., Wang, H.: M2RNet: multi-modal and multi-scale refined network for RGB-D salient object detection. Pattern Recogn. 135, 109139 (2023)
Article Google Scholar
Fu, K., Fan, D.P., Ji, G.P., Zhao, Q., Shen, J., Zhu, C.: Siamese network for RGB-D salient object detection and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5541–5559 (2022)
Google Scholar
Gao, W., Liao, G., Ma, S., Li, G., Liang, Y., Lin, W.: Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection. IEEE Trans. Circ. Syst. Video Technol. 32(4), 2091–2106 (2022)
Article Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1501–1510 (2017)
Google Scholar
Ji, W., et al.: Calibrated RGB-D salient object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 9466–9476 (2021)
Google Scholar
Ji, W., et al.: DMRA: depth-induced multi-scale recurrent attention network for RGB-D saliency detection. IEEE Trans. Image Process. 31, 2321–2336 (2022)
Article Google Scholar
Jin, W.D., Xu, J., Han, Q., Zhang, Y., Cheng, M.M.: CDNet: complementary depth network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3376–3390 (2021)
Article Google Scholar
Jin, X., Yi, K., Xu, J.: MoADNet: mobile asymmetric dual-stream networks for real-time and lightweight RGB-D salient object detection. IEEE Trans. Circ. Syst. Video Technol. 32(11), 7632–7645 (2022)
Article Google Scholar
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: Proceedings of the International Conference on Image Processing, pp. 1115–1119 (2014)
Google Scholar
Li, Z., Lang, C., Liew, J.H., Li, Y., Hou, Q., Feng, J.: Cross-layer feature pyramid network for salient object detection. IEEE Trans. Image Process. 30, 4587–4598 (2021)
Article Google Scholar
Liu, J.J., Hou, Q., Liu, Z.A., Cheng, M.M.: PoolNet+: exploring the potential of pooling for salient object detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 887–904 (2023)
Article Google Scholar
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for RGB-D saliency detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 13756–13765 (2020)
Google Scholar
Liu, Z., Wang, Y., Tu, Z., Xiao, Y., Tang, B.: TriTransNet: RGB-D salient object detection with a triplet transformer embedding network. In: Proceedings of the ACM International Conference on Multimedia, pp. 4481–4490 (2021)
Google Scholar
Mao, Y., Jiang, Q., Cong, R., Gao, W., Shao, F., Kwong, S.: Cross-modality fusion and progressive integration network for saliency prediction on stereoscopic 3D images. IEEE Trans. Multimedia 24, 2435–2448 (2022)
Article Google Scholar
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–461 (2012)
Google Scholar
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: a benchmark and algorithms. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 92–109. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_7
Chapter Google Scholar
Piao, Y., Rong, Z., Zhang, M., Ren, W., Lu, H.: A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9060–9069 (2020)
Google Scholar
Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., Jagersand, M.: BASNet: boundary-aware salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7471–7481 (2019)
Google Scholar
Sun, P., Zhang, W., Wang, H., Li, S., Li, X.: Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1407–1417 (2021)
Google Scholar
Tu, Z., Ma, Y., Li, C., Tang, J., Luo, B.: Edge-guided non-local fully convolutional network for salient object detection. IEEE Trans. Circ. Syst. Video Technol. 31(2), 582–593 (2021)
Article Google Scholar
Wang, W., Lai, Q., Fu, H., Shen, J., Ling, H., Yang, R.: Salient object detection in the deep learning era: an in-depth survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3239–3259 (2022)
Article Google Scholar
Wang, X., Li, S., Chen, C., Fang, Y., Hao, A., Qin, H.: Data-level recombination and lightweight fusion scheme for RGB-D salient object detection. IEEE Trans. Image Process. 30, 458–471 (2021)
Article Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Wu, J., Hao, F., Liang, W., Xu, J.: Transformer fusion and pixel-level contrastive learning for RGB-D salient object detection. IEEE Trans. Multimedia, 1–16 (2023)
Google Scholar
Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: DenseASPP for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3684–3692 (2018)
Google Scholar
Yang, Y., Qin, Q., Luo, Y., Liu, Y., Zhang, Q., Han, J.: Bi-directional progressive guidance network for RGB-D salient object detection. IEEE Trans. Circ. Syst. Video Technol. 32, 5346–5360 (2022)
Article Google Scholar
Yao, Z., Wang, L.: Boundary information progressive guidance network for salient object detection. IEEE Trans. Multimedia 24, 4236–4249 (2022)
Article Google Scholar
Yi, K., Zhu, J., Guo, F., Xu, J.: Cross-stage multi-scale interaction network for RGB-D salient object detection. IEEE Sig. Process. Lett. 29, 2402–2406 (2022)
Article Google Scholar
Zhang, J., Wang, X.: Light field salient object detection via hybrid priors. In: Proceedings of the International Conference on Multimedia Modeling, pp. 361–372 (2020)
Google Scholar
Zhang, Y., et al.: Deep RGB-D saliency detection without depth. IEEE Trans. Multimedia 24, 755–767 (2022)
Article Google Scholar
Zhou, J., Wang, L., Lu, H., Huang, K., Shi, X., Liu, B.: MVSalNet: multi-view augmentation for RGB-D salient object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022. LNCS, vol. 13689, pp. 270–287. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19818-2_16
Zhou, W., Guo, Q., Lei, J., Yu, L., Hwang, J.N.: IRFR-Net: interactive recursive feature-reshaping network for detecting salient objects in RGB-D images. IEEE Trans. Neural Netw. Learn. Syst., 1–13 (2021)
Google Scholar
Zhou, W., Zhu, Y., Lei, J., Wan, J., Yu, L.: CCAFNet: crossflow and cross-scale adaptive fusion network for detecting salient objects in RGB-D images. IEEE Trans. Multimedia 24, 2192–2204 (2022)
Article Google Scholar
Zhu, C., Li, G.: A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3008–3014 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Artificial Intelligence, Nankai University, Tianjin, China
Kang Yi, Hongyu Bai, Yinjie Wang & Jing Xu
Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Haoran Tang & Ping Li

Authors

Kang Yi
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Tang
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Bai
View author publications
You can also search for this author in PubMed Google Scholar
Yinjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ping Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Xu .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yi, K., Tang, H., Bai, H., Wang, Y., Xu, J., Li, P. (2024). Bi-directional Interaction and Dense Aggregation Network for RGB-D Salient Object Detection. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14554. Springer, Cham. https://doi.org/10.1007/978-3-031-53305-1_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-53305-1_36
Published: 28 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53304-4
Online ISBN: 978-3-031-53305-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bi-directional Interaction and Dense Aggregation Network for RGB-D Salient Object Detection