research-article

ADNet: An Asymmetric Dual-Stream Network for RGB-T Salient Object Detection

Authors:

Gangshan WuAuthors Info & Claims

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Article No.: 66, Pages 1 - 7

https://doi.org/10.1145/3595916.3626440

Published: 01 January 2024 Publication History

Abstract

RGB-Thermal salient object detection (RGB-T SOD) aims to locate salient objects in images that include both RGB and thermal information. Previous approaches often suggest designing a symmetric network structure to tackle the challenge of dealing with low-quality RGB or thermal images. However, we contend that RGB and thermal modalities possess different numbers of channels and disparities in information density. In this paper, we propose a novel asymmetric dual-stream network (ADNet). Specifically, we leverage an asymmetric backbone to extract four stages of RGB features and four stages of thermal features. To enable effective interaction among low-level features in the first two stages, we introduce the Channel-Spatial Interaction (CSI) module. In the last two stages, deep features are enhanced using the Self-Attention Enhancement (SAE) module. Experimental results on the VT5000, VT1000, and VT821 datasets attest to the superior performance of our proposed ADNet compared to state-of-the-art methods.

References

[1]

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. In IEEE/CVF conference on computer vision and pattern recognition. 11621–11631.

[2]

Runmin Cong, Kepu Zhang, Chen Zhang, Feng Zheng, Yao Zhao, Qingming Huang, and Sam Kwong. 2022. Does thermal really always matter for RGB-T salient object detection?IEEE Transactions on Multimedia (2022).

[3]

Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. 2005. A tutorial on the cross-entropy method. Annals of operations research 134 (2005), 19–67.

[4]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[5]

Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure-measure: A new way to evaluate foreground maps. In IEEE international conference on computer vision. 4548–4557.

[6]

Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018).

[7]

Jingfan Guo, Tongwei Ren, and Jia Bei. 2016. Salient object detection for RGB-D image via saliency evolution. In IEEE International Conference on Multimedia and Expo. 1–6.

[8]

Yanming Guo, Yu Liu, Theodoros Georgiou, and Michael S. Lew. 2018. A review of semantic segmentation using deep neural networks. International journal of Multimedia Information Retrieval 7 (2018), 87–93.

[9]

Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Tianyu Wang, and Pheng-Ann Heng. 2020. SAC-Net: Spatial attenuation context for salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 1079–1090.

[10]

Fushuo Huo, Xuegui Zhu, Lei Zhang, Qifeng Liu, and Yu Shu. 2021. Efficient context-guided stacked refinement network for RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2021), 3111–3124.

Digital Library

[11]

Fushuo Huo, Xuegui Zhu, Qian Zhang, Ziming Liu, and Wenchao Yu. 2022. Real-time one-stream semantic-guided refinement network for RGB-thermal salient object detection. IEEE Transactions on Instrumentation and Measurement 71 (2022), 1–12.

[12]

Xiurong Jiang, Lin Zhu, Yifan Hou, and Hui Tian. 2022. Mirror complementary transformer network for RGB-thermal salient object detection. arXiv preprint arXiv:2207.03558 (2022).

[13]

Chang Liu, Gang Yang, Shuo Wang, Hangxu Wang, Yunhua Zhang, and Yutao Wang. 2023. TANet: Transformer-based asymmetric network for RGB-D salient object detection. IET Computer Vision (2023).

[14]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE/CVF international conference on computer vision. 10012–10022.

[15]

Zhengyi Liu, Yacheng Tan, Qian He, and Yun Xiao. 2021. SwinNet: Swin transformer drives edge-aware RGB-D and RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2021), 4486–4497.

Digital Library

[16]

Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, and Bin Tang. 2021. TriTransNet: RGB-D salient object detection with a triplet transformer embedding network. In 29th ACM international conference on multimedia. 4481–4490.

Digital Library

[17]

Shuai Ma, Kechen Song, Hongwen Dong, Hongkun Tian, and Yunhui Yan. 2023. Modal complementary fusion network for RGB-T salient object detection. Applied Intelligence 53, 8 (2023), 9038–9055.

Digital Library

[18]

Yunpeng Ma, Dengdi Sun, Qianqian Meng, Zhuanlian Ding, and Chenglong Li. 2017. Learning multiscale deep features and SVM regressors for adaptive RGB-T saliency detection. In 2017 10th International Symposium on Computational Intelligence and Design (ISCID), Vol. 1. 389–392.

[19]

Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. 2014. How to evaluate foreground maps?. In IEEE conference on computer vision and pattern recognition. 248–255.

Digital Library

[20]

Gellért Máttyus, Wenjie Luo, and Raquel Urtasun. 2017. Deeproadmapper: Extracting road topology from aerial images. In IEEE international conference on computer vision. 3438–3446.

[21]

Sachin Mehta and Mohammad Rastegari. 2021. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021).

[22]

Youwei Pang, Xiaoqi Zhao, Lihe Zhang, and Huchuan Lu. 2023. CAVER: Cross-modal view-mixed transformer for bi-modal salient object detection. IEEE Transactions on Image Processing 32 (2023), 892–904.

[23]

Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In IEEE conference on computer vision and pattern recognition. 733–740.

[24]

Sucheng Ren, Qiang Wen, Nanxuan Zhao, Guoqiang Han, and Shengfeng He. 2021. Unifying global-local representations in salient object detection with transformer. arXiv preprint arXiv:2108.02759 (2021).

[25]

Tongwei Ren and Ao Zhang. 2019. RGB-D Salient Object Detection: A Review. Springer International Publishing, Cham, 203–220.

[26]

Zhengzheng Tu, Zhun Li, Chenglong Li, Yang Lang, and Jin Tang. 2021. Multi-interactive dual-decoder for RGB-thermal salient object detection. IEEE Transactions on Image Processing 30 (2021), 5678–5691.

[27]

Zhengzheng Tu, Yan Ma, Chenglong Li, Jin Tang, and Bin Luo. 2020. Edge-guided non-local fully convolutional network for salient object detection. IEEE transactions on circuits and systems for video technology 31, 2 (2020), 582–593.

[28]

Zhengzheng Tu, Yan Ma, Zhun Li, Chenglong Li, Jieming Xu, and Yongtao Liu. 2022. RGBT salient object detection: A large-scale dataset and benchmark. IEEE Transactions on Multimedia (2022).

Digital Library

[29]

Zhengzheng Tu, Tian Xia, Chenglong Li, Yijuan Lu, and Jin Tang. 2019. M3S-NIR: Multi-modal multi-scale noise-insensitive ranking for RGB-T saliency detection. In 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 141–146.

[30]

Zhengzheng Tu, Tian Xia, Chenglong Li, Xiaoxiao Wang, Yan Ma, and Jin Tang. 2019. RGB-T image saliency detection via collaborative graph learning. IEEE Transactions on Multimedia 22, 1 (2019), 160–173.

Digital Library

[31]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[32]

Guizhao Wang, Chenglong Li, Yunpeng Ma, Aihua Zheng, Jin Tang, and Bin Luo. 2018. RGB-T saliency detection benchmark: Dataset, baselines, analysis and a novel approach. In Image and Graphics Technologies and Applications: 13th Conference on Image and Graphics Technologies and Applications, IGTA 2018, Beijing, China, April 8–10, 2018, Revised Selected Papers 13. 359–369.

[33]

Jie Wang, Kechen Song, Yanqi Bao, Liming Huang, and Yunhui Yan. 2021. CGFNet: Cross-guided fusion network for RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2021), 2949–2961.

Digital Library

[34]

Wenguan Wang, Jianbing Shen, and Haibin Ling. 2018. A deep network solution for attention and aesthetics aware photo cropping. IEEE transactions on pattern analysis and machine intelligence 41, 7 (2018), 1531–1544.

[35]

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In IEEE/CVF international conference on computer vision. 568–578.

[36]

Zhou Wang, Eero P Simoncelli, and Alan C Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2. 1398–1402.

[37]

Jun Wei, Shuhui Wang, Zhe Wu, Chi Su, Qingming Huang, and Qi Tian. 2020. Label decoupling framework for salient object detection. In IEEE/CVF conference on computer vision and pattern recognition. 13025–13034.

[38]

Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In European conference on computer vision. 3–19.

Digital Library

[39]

Zhengxuan Xie, Feng Shao, Gang Chen, Hangwei Chen, Qiuping Jiang, Xiangchao Meng, and Yo-Sung Ho. 2023. Cross-modality double bidirectional interaction and fusion network for RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology (2023).

Digital Library

[40]

Chang Xu, Qingwu Li, Qingkai Zhou, Xiongbiao Jiang, Dabing Yu, and Yaqin Zhou. 2022. Asymmetric cross-modal activation network for RGB-T salient object detection. Knowledge-Based Systems 258 (2022), 110047.

Digital Library

[41]

Jin Zhang, Yanjiao Shi, Qing Zhang, Liu Cui, Ying Chen, and Yugen Yi. 2022. Attention guided contextual feature fusion network for salient object detection. Image and Vision Computing 117 (2022), 104337.

Digital Library

[42]

Qiang Zhang, Nianchang Huang, Lin Yao, Dingwen Zhang, Caifeng Shan, and Jungong Han. 2019. RGB-T salient object detection via fusing multi-level CNN features. IEEE Transactions on Image Processing 29 (2019), 3321–3335.

Digital Library

[43]

Qiang Zhang, Tonglin Xiao, Nianchang Huang, Dingwen Zhang, and Jungong Han. 2020. Revisiting feature fusion for RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 31, 5 (2020), 1804–1818.

Digital Library

[44]

Heqin Zhu, Xu Sun, Yuexiang Li, Kai Ma, S Kevin Zhou, and Yefeng Zheng. 2022. DFTR: Depth-supervised fusion transformer for salient object detection. arXiv preprint arXiv:2203.06429 (2022).

Index Terms

ADNet: An Asymmetric Dual-Stream Network for RGB-T Salient Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Object detection

Recommendations

RGB-T salient object detection via excavating and enhancing CNN features
Abstract
RGB-T salient object detection aims to identify the most attractive object(s) in a scene using RGB and thermal data. For this task, on the one hand, how to excavate salient clues is crucial to improve the detection performance of the model. On the ... $^{}$
Asymmetric cross-modal activation network for RGB-T salient object detection
Abstract
RGB-thermal salient object detection (RGB-T SOD) has unique advantages in terms of handling challenging scenes with cluttered backgrounds, low illumination, and low contrast. However, because they do not consider the significant ...
Highlights
- Two-stream encoder is proposed to extract multimodality hierarchical features.
- ...
Bilateral Mammogram Mass Detection Based on Window Cross Attention
Artificial Neural Networks and Machine Learning – ICANN 2023
Abstract
Breast cancer is the most common cancer in the world. Mammogram mass detection aids in the early detection of breast cancer and increases patient survival rates. Because the bilateral breasts of the same patient are similar and symmetrical, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

December 2023

745 pages

ISBN:9798400702051

DOI:10.1145/3595916

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the Fundamental Research Funds for the Central Universities
Key R&D Project of Jiangsu Province
the Program B for Outstanding Ph.D. candidate of Nanjing University
National Natural Science Foundation of China
the Collaborative Innovation Center of Novel Software Technology and Industrialization

Conference

MMAsia '23

Sponsor:

SIGMM

MMAsia '23: ACM Multimedia Asia

December 6 - 8, 2023

Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
89
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)4

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten