skip to main content
10.1145/3595916.3626440acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

ADNet: An Asymmetric Dual-Stream Network for RGB-T Salient Object Detection

Published: 01 January 2024 Publication History

Abstract

RGB-Thermal salient object detection (RGB-T SOD) aims to locate salient objects in images that include both RGB and thermal information. Previous approaches often suggest designing a symmetric network structure to tackle the challenge of dealing with low-quality RGB or thermal images. However, we contend that RGB and thermal modalities possess different numbers of channels and disparities in information density. In this paper, we propose a novel asymmetric dual-stream network (ADNet). Specifically, we leverage an asymmetric backbone to extract four stages of RGB features and four stages of thermal features. To enable effective interaction among low-level features in the first two stages, we introduce the Channel-Spatial Interaction (CSI) module. In the last two stages, deep features are enhanced using the Self-Attention Enhancement (SAE) module. Experimental results on the VT5000, VT1000, and VT821 datasets attest to the superior performance of our proposed ADNet compared to state-of-the-art methods.

References

[1]
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. In IEEE/CVF conference on computer vision and pattern recognition. 11621–11631.
[2]
Runmin Cong, Kepu Zhang, Chen Zhang, Feng Zheng, Yao Zhao, Qingming Huang, and Sam Kwong. 2022. Does thermal really always matter for RGB-T salient object detection?IEEE Transactions on Multimedia (2022).
[3]
Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. 2005. A tutorial on the cross-entropy method. Annals of operations research 134 (2005), 19–67.
[4]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[5]
Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure-measure: A new way to evaluate foreground maps. In IEEE international conference on computer vision. 4548–4557.
[6]
Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018).
[7]
Jingfan Guo, Tongwei Ren, and Jia Bei. 2016. Salient object detection for RGB-D image via saliency evolution. In IEEE International Conference on Multimedia and Expo. 1–6.
[8]
Yanming Guo, Yu Liu, Theodoros Georgiou, and Michael S. Lew. 2018. A review of semantic segmentation using deep neural networks. International journal of Multimedia Information Retrieval 7 (2018), 87–93.
[9]
Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Tianyu Wang, and Pheng-Ann Heng. 2020. SAC-Net: Spatial attenuation context for salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 1079–1090.
[10]
Fushuo Huo, Xuegui Zhu, Lei Zhang, Qifeng Liu, and Yu Shu. 2021. Efficient context-guided stacked refinement network for RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2021), 3111–3124.
[11]
Fushuo Huo, Xuegui Zhu, Qian Zhang, Ziming Liu, and Wenchao Yu. 2022. Real-time one-stream semantic-guided refinement network for RGB-thermal salient object detection. IEEE Transactions on Instrumentation and Measurement 71 (2022), 1–12.
[12]
Xiurong Jiang, Lin Zhu, Yifan Hou, and Hui Tian. 2022. Mirror complementary transformer network for RGB-thermal salient object detection. arXiv preprint arXiv:2207.03558 (2022).
[13]
Chang Liu, Gang Yang, Shuo Wang, Hangxu Wang, Yunhua Zhang, and Yutao Wang. 2023. TANet: Transformer-based asymmetric network for RGB-D salient object detection. IET Computer Vision (2023).
[14]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE/CVF international conference on computer vision. 10012–10022.
[15]
Zhengyi Liu, Yacheng Tan, Qian He, and Yun Xiao. 2021. SwinNet: Swin transformer drives edge-aware RGB-D and RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2021), 4486–4497.
[16]
Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, and Bin Tang. 2021. TriTransNet: RGB-D salient object detection with a triplet transformer embedding network. In 29th ACM international conference on multimedia. 4481–4490.
[17]
Shuai Ma, Kechen Song, Hongwen Dong, Hongkun Tian, and Yunhui Yan. 2023. Modal complementary fusion network for RGB-T salient object detection. Applied Intelligence 53, 8 (2023), 9038–9055.
[18]
Yunpeng Ma, Dengdi Sun, Qianqian Meng, Zhuanlian Ding, and Chenglong Li. 2017. Learning multiscale deep features and SVM regressors for adaptive RGB-T saliency detection. In 2017 10th International Symposium on Computational Intelligence and Design (ISCID), Vol. 1. 389–392.
[19]
Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. 2014. How to evaluate foreground maps?. In IEEE conference on computer vision and pattern recognition. 248–255.
[20]
Gellért Máttyus, Wenjie Luo, and Raquel Urtasun. 2017. Deeproadmapper: Extracting road topology from aerial images. In IEEE international conference on computer vision. 3438–3446.
[21]
Sachin Mehta and Mohammad Rastegari. 2021. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021).
[22]
Youwei Pang, Xiaoqi Zhao, Lihe Zhang, and Huchuan Lu. 2023. CAVER: Cross-modal view-mixed transformer for bi-modal salient object detection. IEEE Transactions on Image Processing 32 (2023), 892–904.
[23]
Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In IEEE conference on computer vision and pattern recognition. 733–740.
[24]
Sucheng Ren, Qiang Wen, Nanxuan Zhao, Guoqiang Han, and Shengfeng He. 2021. Unifying global-local representations in salient object detection with transformer. arXiv preprint arXiv:2108.02759 (2021).
[25]
Tongwei Ren and Ao Zhang. 2019. RGB-D Salient Object Detection: A Review. Springer International Publishing, Cham, 203–220.
[26]
Zhengzheng Tu, Zhun Li, Chenglong Li, Yang Lang, and Jin Tang. 2021. Multi-interactive dual-decoder for RGB-thermal salient object detection. IEEE Transactions on Image Processing 30 (2021), 5678–5691.
[27]
Zhengzheng Tu, Yan Ma, Chenglong Li, Jin Tang, and Bin Luo. 2020. Edge-guided non-local fully convolutional network for salient object detection. IEEE transactions on circuits and systems for video technology 31, 2 (2020), 582–593.
[28]
Zhengzheng Tu, Yan Ma, Zhun Li, Chenglong Li, Jieming Xu, and Yongtao Liu. 2022. RGBT salient object detection: A large-scale dataset and benchmark. IEEE Transactions on Multimedia (2022).
[29]
Zhengzheng Tu, Tian Xia, Chenglong Li, Yijuan Lu, and Jin Tang. 2019. M3S-NIR: Multi-modal multi-scale noise-insensitive ranking for RGB-T saliency detection. In 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 141–146.
[30]
Zhengzheng Tu, Tian Xia, Chenglong Li, Xiaoxiao Wang, Yan Ma, and Jin Tang. 2019. RGB-T image saliency detection via collaborative graph learning. IEEE Transactions on Multimedia 22, 1 (2019), 160–173.
[31]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[32]
Guizhao Wang, Chenglong Li, Yunpeng Ma, Aihua Zheng, Jin Tang, and Bin Luo. 2018. RGB-T saliency detection benchmark: Dataset, baselines, analysis and a novel approach. In Image and Graphics Technologies and Applications: 13th Conference on Image and Graphics Technologies and Applications, IGTA 2018, Beijing, China, April 8–10, 2018, Revised Selected Papers 13. 359–369.
[33]
Jie Wang, Kechen Song, Yanqi Bao, Liming Huang, and Yunhui Yan. 2021. CGFNet: Cross-guided fusion network for RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2021), 2949–2961.
[34]
Wenguan Wang, Jianbing Shen, and Haibin Ling. 2018. A deep network solution for attention and aesthetics aware photo cropping. IEEE transactions on pattern analysis and machine intelligence 41, 7 (2018), 1531–1544.
[35]
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In IEEE/CVF international conference on computer vision. 568–578.
[36]
Zhou Wang, Eero P Simoncelli, and Alan C Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2. 1398–1402.
[37]
Jun Wei, Shuhui Wang, Zhe Wu, Chi Su, Qingming Huang, and Qi Tian. 2020. Label decoupling framework for salient object detection. In IEEE/CVF conference on computer vision and pattern recognition. 13025–13034.
[38]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In European conference on computer vision. 3–19.
[39]
Zhengxuan Xie, Feng Shao, Gang Chen, Hangwei Chen, Qiuping Jiang, Xiangchao Meng, and Yo-Sung Ho. 2023. Cross-modality double bidirectional interaction and fusion network for RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology (2023).
[40]
Chang Xu, Qingwu Li, Qingkai Zhou, Xiongbiao Jiang, Dabing Yu, and Yaqin Zhou. 2022. Asymmetric cross-modal activation network for RGB-T salient object detection. Knowledge-Based Systems 258 (2022), 110047.
[41]
Jin Zhang, Yanjiao Shi, Qing Zhang, Liu Cui, Ying Chen, and Yugen Yi. 2022. Attention guided contextual feature fusion network for salient object detection. Image and Vision Computing 117 (2022), 104337.
[42]
Qiang Zhang, Nianchang Huang, Lin Yao, Dingwen Zhang, Caifeng Shan, and Jungong Han. 2019. RGB-T salient object detection via fusing multi-level CNN features. IEEE Transactions on Image Processing 29 (2019), 3321–3335.
[43]
Qiang Zhang, Tonglin Xiao, Nianchang Huang, Dingwen Zhang, and Jungong Han. 2020. Revisiting feature fusion for RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 31, 5 (2020), 1804–1818.
[44]
Heqin Zhu, Xu Sun, Yuexiang Li, Kai Ma, S Kevin Zhou, and Yefeng Zheng. 2022. DFTR: Depth-supervised fusion transformer for salient object detection. arXiv preprint arXiv:2203.06429 (2022).

Index Terms

  1. ADNet: An Asymmetric Dual-Stream Network for RGB-T Salient Object Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
      December 2023
      745 pages
      ISBN:9798400702051
      DOI:10.1145/3595916
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 January 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. RGB-T salient object detection
      2. cross attention
      3. cross-modal fusion.
      4. multi-head self attention

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • the Fundamental Research Funds for the Central Universities
      • Key R&D Project of Jiangsu Province
      • the Program B for Outstanding Ph.D. candidate of Nanjing University
      • National Natural Science Foundation of China
      • the Collaborative Innovation Center of Novel Software Technology and Industrialization

      Conference

      MMAsia '23
      Sponsor:
      MMAsia '23: ACM Multimedia Asia
      December 6 - 8, 2023
      Tainan, Taiwan

      Acceptance Rates

      Overall Acceptance Rate 59 of 204 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 89
        Total Downloads
      • Downloads (Last 12 months)63
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media