Skip to main content

Bidirectional Alternating Fusion Network for RGB-T Salient Object Detection

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15038))

Included in the following conference series:

  • 89 Accesses

Abstract

RGB-Thermal Salient Object Detection(SOD) aims to identify common salient regions or objects from both the visible and thermal infrared modalities. Existing methods usually based on the hierarchical interactions within the same modality or between different modalities at the same level. However, this approach may lead to a situation where one modality or one level of features dominates the fusion result during the fusion process, failing to fully utilize the complementary information of the two modalities. Additionally, these methods usually overlooking the potential for the network to extract specific information in each modality. To address these issues, we propose a Bidirectional Alternating Fusion Network (BAFNet) consisting of three modules for RGB-T salient object detection. In particular, we design a Global Information Enhancement Module(GIEM) for improving the information representation of high-level features. Then we propose a novel bidirectional alternating fusion strategy which is applied during decoding, and we design a Multi-modal Multi-level Fusion Module(MMFM) for collaborating mulit-modal mulit-level information. Furthermore, we embed the proposed Modal Erase Module (MEM) into both GIEM and MMFM to extract the inherent specific information in each modality. Our extensive experiments on three public benchmark datasets show that our method achieves outstanding performance compared to state-of-the-art methods.

This work was supported by the University Synergy Innovation Program of Anhui Province under Grant No.GXXT-2022-014, in part by National Natural Science Foundation of China under Grant No.62376005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yang, E., Zhou, W., Qian X.: MGCNet: multilevel gated collaborative network for RGB-D semantic segmentation of indoor scene. IEEE Signal Process. Lett. 29, 2567–2571 (2022)

    Google Scholar 

  2. Xu, J., Xiong, Z.: PIDNet: a real-time semantic segmentation network inspired by PID controllers. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19529–19539

    Google Scholar 

  3. Ying, X., Chuah., M.C.: UCTNet: uncertainty-aware cross-modal transformer network for indoor RGB-D semantic segmentation. In: European Conference on Computer Vision, vol. 13690. Springer, Heidelberg (2022). ISBN:978-3-031-20055-7

    Google Scholar 

  4. Xinyi, W., Yuan, X.: RGB-D road segmentation based on geometric prior information. In: Pattern Recognition and Computer Vision: 6th Chinese Conference, PRCV,: Xiamen, China, 13–15 Oct 2023, Proceedings, Part I. Springer, Heidelberg, pp. 434–445 (2023). https://doi.org/10.1007/978-981-99-8429-935

  5. Xiao, Y., Yang, M.: Attribute-based progressive fusion network for RGBT tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2831–2838. https://doi.org/10.1609/aaai.v36i3.20187

  6. Tang, Z., Xu, T.: Exploring fusion strategies for accurate RGBT visual object tracking. Inf. Fusion 99, 101881 (2023). ISSN:1566-2535

    Google Scholar 

  7. Loghmani, M.R., Robbiano, L.: Unsupervised domain adaptation through inter-modal rotation for RGB-D object recognition. IEEE Robot. Autom. Lett. 5(4), 6631–6638 (2020). Oct

    Article  Google Scholar 

  8. Song, Z., Qin, P.: EdgeFusion: infrared and visible image fusion algorithm in low light. In: Pattern Recognition and Computer Vision: 6th Chinese Conference, PRCV,: Xiamen, China, 13–15 Oct 2023, Proceedings, pp. 259–270. Part I. Springer-Verlag, Berlin, Heidelberg (2023)

    Google Scholar 

  9. Jiang, S., Xu, Y.: Multi-scale fusion for RGB-D indoor semantic segmentation. Sci. Rep. 20305, 2045–2322 (2022)

    Google Scholar 

  10. Zhang, T., Li, H.: MGT: modality-guided transformer for infrared and visible image fusion. In: Pattern Recognition and Computer Vision: 6th Chinese Conference, PRCV,: Xiamen, China, 13–15 Oct 2023, Proceedings, Part I. Springer, Heidelberg, pp. 321–332 (2023). https://doi.org/10.1007/978-981-99-8429-926

  11. Wang, C., Xu, C.: Cross-modal pattern-propagation for RGB-T tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 7062–7071 (2020)

    Google Scholar 

  12. Wang, D., Liu, J.: An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection. Inf. Fusion 98, 101828 (2023). (Elsevier)

    Google Scholar 

  13. Lee, M., Park, C.: SPSN: superpixel prototype sampling network for rgb-d salient object detection. In: Computer Vision-ECCV: 17th European Conference, Tel Aviv, Israel, 23–27 Oct 2022, Proceedings, pp. 630–647. Part XXIX. Springer-Verlag, Berlin, Heidelberg (2022)

    Google Scholar 

  14. Fushuo, H., Xuegui, Z.: Efficient context-guided stacked refinement network for RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(5), 3111–3124 (2022). May

    Article  Google Scholar 

  15. Wang, Y., Dong, F.: Interactive context-aware network for RGB-T salient object detection. Multimed. Tools Appl. 1–22 (2024). (Springer)

    Google Scholar 

  16. Ma, S., Song, K.: Modal complementary fusion network for RGB-T salient object detection. Appl. Intell. 53(8), 9038–9055 (2023). (Springer)

    Google Scholar 

  17. Wujie, Z., Qinling, G.: ECFFNet: effective and consistent feature fusion network for RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1224–1235 (2022). March

    Article  Google Scholar 

  18. Guibiao, L., Wei, G.: Cross-collaborative fusion-encoder network for robust RGB-thermal salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7646–7661 (2022). Nov.

    Article  Google Scholar 

  19. Wang, J., Song, K.: CGFNet: cross-guided fusion network for RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(5), 2949–2961 (2022). May

    Article  MathSciNet  Google Scholar 

  20. Tu, Z., Li, Z.: Multi-interactive dual-decoder for RGB-thermal salient object detection. IEEE Trans. Image Process. 30, 5678–5691 (2021). https://doi.org/10.1109/TIP.2021.3087412

    Article  Google Scholar 

  21. Hou, R., Chang, H.: Temporal complementary learning for video person re-identification. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part XXV 16, pp. 388–405. Springer (2020)

    Google Scholar 

  22. Zhao, X., Pang, Y.: Suppress and balance: a simple gated network for salient object detection. In: Computer Vision-ECCV,: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, pp. 35–51. Part II. Springer-Verlag, Berlin, Heidelberg (2020)

    Google Scholar 

  23. Tian, X., Zhang, J.: Modeling the distributional uncertainty for salient object detection models. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 19660–19670 (2023)

    Google Scholar 

  24. Pang, Y., Zhao, X.: Multi-scale interactive network for salient object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 9410–9419 (2020)

    Google Scholar 

  25. Wu, Z., Wang, L.: Pixel is all you need: adversarial trajectory-ensemble active learning for salient object detection. In: AAAI Conference on Artificial Intelligence, vol. 37, no. 3, pp. 2883–2891 (2023)

    Google Scholar 

  26. Liu, J.-J., Hou, Q.: PoolNet+: exploring the potential of pooling for salient object detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 887–904 (1 Jan 2023)

    Google Scholar 

  27. Ma, M., Xia, C.: Boosting broader receptive fields for salient object detection. IEEE Trans. Image Process. 32, 1026–1038 (2023). https://doi.org/10.1109/TIP.2022.3232209

  28. Tu, Z., Xia, T.: RGB-T image saliency detection via collaborative graph learning. IEEE Trans. Multimed. 22(1), 160–173 (Jan 2020). https://doi.org/10.1109/TMM.2019.2924578

  29. Gao, W., Liao, G.: Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(4), 2091–2106 (April 2022)

    Google Scholar 

  30. Tu, Z., Ma, Y.: RGBT salient object detection: a large-scale dataset and benchmark. IEEE Trans. Multimed. 25, pp. 4163–4176 (2020). https://doi.org/10.1109/TMM.2022.3171688

  31. Dai, J., Qi, H.: Deformable convolutional networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 764–773

    Google Scholar 

  32. Vaswani, A., Shazeer, N.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, pp. 6000–6010

    Google Scholar 

  33. Tolstikhin, I.O., Houlsby, N.: Mlp-mixer: an all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 34, 24261–24272 (2021)

    Google Scholar 

  34. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

    Google Scholar 

  35. Godard, C, Mac Aodha, O.: Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602–6611

    Google Scholar 

  36. Milletari, F.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 2016, pp. 565–571 (2016)

    Google Scholar 

  37. Tang, J., Fan, D.: RGBT Salient Object Detection: Benchmark and A Novel Cooperative Ranking Approach, vol. 30, no. 12, pp. 4421–4433 (2020)

    Google Scholar 

  38. Liu, Z., Tan, Y.: SwinNet: swin transformer drives edge-aware RGB-D and RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4486–4497 (2022)

    Article  Google Scholar 

  39. Huo, F., Zhu, X.: Real-time one-stream semantic-guided refinement network for RGB-thermal salient object detection. IEEE Trans. Instrum. Meas. 71, 1–12 (2022)

    Google Scholar 

  40. Cong, R., Zhang, K.: Does thermal really always matter for RGB-T salient object detection? IEEE Trans. Multimed. 25, 6971–6982 (2023)

    Google Scholar 

  41. Tu, Z., Li, Z.: Weakly alignment-free RGBT salient object detection with deep correlation network. IEEE Trans. Image Process. 31, 3752–3764 (2022). https://doi.org/10.1109/TIP.2022.3176540

  42. Zhou, W., Zhu, Y.: LSNet: lightweight spatial boosting network for detecting salient objects in RGB-thermal images. IEEE Trans. Image Process. 32, 1329–1340 (2023)

    Google Scholar 

  43. Tang, B, Liu, Z.: HRTransNet: HRFormer-driven two-modality salient object detection. IEEE Trans. Circuits Syst. Video Technol. 33(2), 728–742 (2023)

    Google Scholar 

  44. Pang, Y., Zhao, X.: CAVER: cross-modal view-mixed transformer for bi-modal salient object detection. IEEE Trans. Image Process. 32, 892–904 (2023). https://doi.org/10.1109/TIP.2023.3234702

  45. Liu, Z, Lin, Y.: Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9992–10002 (2021). https://doi.org/10.1109/ICCV48922.2021.00986

  46. Wang, X., Girshick, R.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sulan Zhai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tu, Z., Lin, D., Jiang, B., Gu, L., Wang, K., Zhai, S. (2025). Bidirectional Alternating Fusion Network for RGB-T Salient Object Detection. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15038. Springer, Singapore. https://doi.org/10.1007/978-981-97-8685-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-8685-5_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-8684-8

  • Online ISBN: 978-981-97-8685-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics