Three-Stage Bidirectional Interaction Network for Efficient RGB-D Salient Object Detection

Wang, Yang; Zhang, Yanqing

doi:10.1007/978-3-031-26348-4_13

Yang Wang¹² &
Yanqing Zhang¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13845))

Included in the following conference series:

Asian Conference on Computer Vision

323 Accesses
4 Citations

Abstract

The addition of depth maps improves the performance of salient object detection (SOD). However, most existing RGB-D SOD methods are inefficient. We observe that existing models take into account the respective advantages of the two modalities but do not fully explore the roles of cross-modality features of various levels. To this end, we remodel the relationship between RGB features and depth features from a new perspective of the feature encoding stage and propose a three-stage bidirectional interaction network (TBINet). Specifically, to obtain robust feature representations, we propose three interaction strategies: bidirectional attention guidance (BAG), bidirectional feature supplement (BFS), and shared network, and use them for the three stages of feature encoder, respectively. In addition, we propose a cross-modality feature aggregation (CFA) module for feature aggregation and refinement. Our model is lightweight (3.7 M parameters) and fast (329 ms on CPU). Experiments on six benchmark datasets show that TBINet outperforms other SOTA methods. Our model achieves the best performance and efficiency trade-off.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shimoda, W., Yanai, K.: Distinct class-specific saliency maps for weakly supervised semantic segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 218–234. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_14
Chapter Google Scholar
Zeng, Y., Zhuge, Y., Lu, H., Zhang, L.: Joint learning of saliency detection and weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7223–7233 (2019)
Google Scholar
Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: International Conference on Machine Learning, pp. 597–606. PMLR (2015)
Google Scholar
Mahadevan, V., Vasconcelos, N.: Saliency-based discriminant tracking. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1007–1013. IEEE (2009)
Google Scholar
Guo, C., Zhang, L.: A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans. Image Process. 19(1), 185–198 (2009)
MathSciNet MATH Google Scholar
Ji, Q.G., Fang, Z.D., Xie, Z.H., Lu, Z.M.: Video abstraction based on the visual attention model and online clustering. Signal Process. Image Commun. 28(3), 241–253 (2013)
Article Google Scholar
Cheng, M.M., Hou, Q.B., Zhang, S.H., Rosin, P.L.: Intelligent visual media processing: when graphics meets vision. J. Comput. Sci. Technol. 32(1), 110–121 (2017)
Article Google Scholar
Fan, D.-P., Zhai, Y., Borji, A., Yang, J., Shao, L.: BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 275–292. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_17
Chapter Google Scholar
Jin, W.D., Xu, J., Han, Q., Zhang, Y., Cheng, M.M.: CDNet: complementary depth network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3376–3390 (2021)
Article Google Scholar
Liu, Z., Wang, Y., Tu, Z., Xiao, Y., Tang, B.: TritransNet: RGB-D salient object detection with a triplet transformer embedding network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4481–4490 (2021)
Google Scholar
Luo, A., Li, X., Yang, F., Jiao, Z., Cheng, H., Lyu, S.: Cascade graph neural networks for RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 346–364. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_21
Chapter Google Scholar
Piao, Y., Rong, Z., Zhang, M., Ren, W., Lu, H.: A2Dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9060–9069 (2020)
Google Scholar
Zhou, T., Fu, H., Chen, G., Zhou, Y., Fan, D.P., Shao, L.: Specificity-preserving RGB-D saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4681–4691 (2021)
Google Scholar
Li, G., Liu, Z., Chen, M., Bai, Z., Lin, W., Ling, H.: Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)
Article Google Scholar
Zhang, W., Ji, G.P., Wang, Z., Fu, K., Zhao, Q.: Depth quality-inspired feature manipulation for efficient RGB-D salient object detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 731–740 (2021)
Google Scholar
Zhang, W., Jiang, Y., Fu, K., Zhao, Q.: BTS-Net: bi-directional transfer-and-selection network for RGB-D salient object detection. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2021)
Google Scholar
Zhang, C., et al.: Cross-modality discrepant interaction network for RGB-D salient object detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 2094–2102 (2021)
Google Scholar
Fu, K., Fan, D.P., Ji, G.P., Zhao, Q., Shen, J., Zhu, C.: Siamese network for RGB-D salient object detection and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5541–5559 (2021)
Google Scholar
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8
Chapter Google Scholar
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Desingh, K., Krishna, K.M., Rajan, D., Jawahar, C.: Depth really matters: Improving visual salient region detection with depth. In: BMVC, pp. 1–11 (2013)
Google Scholar
Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2343–2350 (2016)
Google Scholar
Lang, C., Nguyen, T.V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 101–115. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_8
Chapter Google Scholar
Ren, J., Gong, X., Yu, L., Zhou, W., Ying Yang, M.: Exploiting global priors for RGB-D saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 25–32 (2015)
Google Scholar
Chen, Q., Liu, Z., Zhang, Y., Fu, K., Zhao, Q., Du, H.: RGB-D salient object detection via 3d convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1063–1071 (2021)
Google Scholar
Ji, W., et al.: Calibrated RGB-D salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9471–9481 (2021)
Google Scholar
Sun, P., Zhang, W., Wang, H., Li, S., Li, X.: Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1407–1417 (2021)
Google Scholar
Zhang, M., Fei, S.X., Liu, J., Xu, S., Piao, Y., Lu, H.: Asymmetric two-stream architecture for accurate RGB-D saliency detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 374–390. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_23
Chapter Google Scholar
Fan, D.P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.M.: Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32(5), 2075–2089 (2020)
Article Google Scholar
Li, C., Cong, R., Piao, Y., Xu, Q., Loy, C.C.: RGB-D salient object detection with cross-modality modulation and selection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 225–241. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_14
Chapter Google Scholar
Zhou, T., Fan, D.P., Cheng, M.M., Shen, J., Shao, L.: RGB-D salient object detection: a survey. Comput. Visual Med. 7(1), 37–69 (2021)
Article Google Scholar
Wang, W., Lai, Q., Fu, H., Shen, J., Ling, H., Yang, R.: Salient object detection in the deep learning era: an in-depth survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3239–3259 (2021)
Article Google Scholar
Zhao, X., Zhang, L., Pang, Y., Lu, H., Zhang, L.: A single stream network for robust and real-time RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 646–662. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_39
Chapter Google Scholar
Chen, S., Fu, Y.: Progressively guided alternate refinement network for RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 520–538. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_31
Chapter Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Wu, Y.H., Liu, Y., Xu, J., Bian, J.W., Gu, Y.C., Cheng, M.M.: MobileSal: extremely efficient RGB-D salient object detection. IEEE Trans. Pattern Anal. Mach. Intell. 44, 10261–10269 (2021). https://doi.org/10.1109/TPAMI.2021.3134684
Article Google Scholar
Li, G., Liu, Z., Ye, L., Wang, Y., Ling, H.: Cross-modal weighting network for RGB-D salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 665–681. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_39
Chapter Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 404–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_24
Chapter Google Scholar
Wei, J., Wang, S., Huang, Q.: F\(^3\)net: fusion, feedback and focus for salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12321–12328 (2020)
Google Scholar
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–461. IEEE (2012)
Google Scholar
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1115–1119. IEEE (2014)
Google Scholar
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: a benchmark and algorithms. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 92–109. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_7
Chapter Google Scholar
Zhu, C., Li, G.: A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3008–3014 (2017)
Google Scholar
Cheng, Y., Fu, H., Wei, X., Xiao, J., Cao, X.: Depth enhanced saliency detection method. In: Proceedings of International Conference on Internet Multimedia Computing and Service, pp. 23–27 (2014)
Google Scholar
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604. IEEE (2009)
Google Scholar
Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 698–704 (2018)
Google Scholar
Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4548–4557 (2017)
Google Scholar
Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: contrast based filtering for salient region detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 733–740. IEEE (2012)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances In Neural Information Processing Systems, vol. 25 (2012)
Google Scholar
Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)
Article MathSciNet MATH Google Scholar
Zhang, J., et al.: UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8582–8591 (2020)
Google Scholar
Liu, N., Zhang, N., Han, J.: Learning selective self-mutual attention for RGB-D saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13756–13765 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

South China University of Technology, Guangzhou, China
Yang Wang & Yanqing Zhang

Authors

Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanqing Zhang .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1970 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Zhang, Y. (2023). Three-Stage Bidirectional Interaction Network for Efficient RGB-D Salient Object Detection. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13845. Springer, Cham. https://doi.org/10.1007/978-3-031-26348-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-26348-4_13
Published: 09 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26347-7
Online ISBN: 978-3-031-26348-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Three-Stage Bidirectional Interaction Network for Efficient RGB-D Salient Object Detection