skip to main content
10.1145/3394171.3413523acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

Published: 12 October 2020 Publication History

Abstract

Most existing RGB-D salient object detection (SOD) methods directly extract and fuse raw features from RGB and depth backbones. Such methods can be easily restricted by low-quality depth maps and redundant cross-modal features. To effectively capture multi-scale cross-modal fusion features, this paper proposes a novel Multi-stage and Multi-Scale Fusion Network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the mechanism of visual color stage doctrine in human visual system, the proposed CMFM aims to explore the useful and important feature representations in feature response stage, and effectively integrate them into available cross-modal fusion features in adversarial combination stage. Moreover, the proposed BMD learns the combination of cross-modal fusion features from multiple levels to capture both local and global information of salient objects and further reasonably boost the performance of the proposed method. Comprehensive experiments demonstrate that the proposed method can achieve consistently superior performance over the other 14 state-of-the-art methods on six popular RGB-D datasets when evaluated by 8 different metrics.

Supplementary Material

MP4 File (3394171.3413523.mp4)
Most existing RGB-D salient object detection (SOD) methods directly extract and fuse raw features from RGB and depth backbones. Such methods can be easily restricted by low-quality depth maps and redundant cross-modal features.\r\nIn this study, we present a deep-learning framework for RGB-D SOD. Inspired by the mechanism of visual color stage doctrine, we design a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD) for accurate RGB-D SOD. Comprehensive experiments demonstrate that our proposed method achieves considerable improvements when compared with the other 14 state-of-the-art methods on six benchmark datasets using 8 different evaluation metrics.

References

[1]
Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1597--1604.
[2]
Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE transactions on image processing 24, 12 (2015), 5706--5722.
[3]
Hao Chen and Youfu Li. 2018. Progressively complementarity-aware fusion network for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3051--3060.
[4]
Hao Chen and Youfu Li. 2019. Three-stream attention-aware network for RGBD salient object detection. IEEE Transactions on Image Processing 28, 6 (2019), 2825--2835.
[5]
Hao Chen, Youfu Li, and Dan Su. 2019. Multi-modal fusion network with multiscale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition 86 (2019), 376--385.
[6]
Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang. 2019. Review of Visual Saliency Detection with Comprehensive Information. IEEE Transactions on Circuits and Systems for Video Technology 29, 10 (2019), 2941--2959.
[7]
Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. 2020. Going from RGB to RGBD saliency: A depth-guided transformation model. IEEE Transactions on Cybernetics 50, 8 (2020), 3627--3639.
[8]
Runmin Cong, Jianjun Lei, Changqing Zhang, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2016. Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Processing Letters 23, 6 (2016), 819--823.
[9]
Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment Measure for Binary Foreground Map Evaluation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 698--704.
[10]
Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision. 4548--4557.
[11]
Deng-Ping Fan, Zheng Lin, Jia-Xing Zhao, Yun Liu, Zhao Zhang, Qibin Hou, Menglong Zhu, and Ming-Ming Cheng. 2019. Rethinking RGB-D salient object detection: Models, datasets, and large-scale benchmarks. arXiv preprint arXiv:1907.06781 (2019).
[12]
David Feng, Nick Barnes, Shaodi You, and Chris McCarthy. 2016. Local background enclosure for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2343--2350.
[13]
Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. 2020. Res2Net: A New Multi-scale Backbone Architecture. IEEE TPAMI (2020).
[14]
Chenlei Guo and Liming Zhang. 2009. A novel multiresolution spatio temporal saliency detection model and its applications in image and video compression. IEEE transactions on image processing 19, 1 (2009), 185--198.
[15]
Jingfan Guo, Tongwei Ren, and Jia Bei. 2016. Salient object detection for RGB-D image via saliency evolution. In 2016 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[16]
Junwei Han, Hao Chen, Nian Liu, Chenggang Yan, and Xuelong Li. 2017. CNNsbased RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE transactions on cybernetics 48, 11 (2017), 3171--3183.
[17]
Junwei Han, King Ngi Ngan, Mingjing Li, and Hong-Jiang Zhang. 2005. Unsupervised extraction of visual attention objects in color images. IEEE transactions on circuits and systems for video technology 16, 1 (2005), 141--145.
[18]
Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning. 597--606.
[19]
Koteswar Rao Jerripothula, Jianfei Cai, and Junsong Yuan. 2016. Image cosegmentation via saliency co-fusion. IEEE Transactions on Multimedia 18, 9 (2016), 1896--1909.
[20]
Qiuping Jiang, Feng Shao, Weisi Lin, Ke Gu, Gangyi Jiang, and Huifang Sun. 2017. Optimizing multistage discriminative dictionaries for blind image quality assessment. IEEE Transactions on Multimedia 20, 8 (2017), 2035--2048.
[21]
Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. 2014. Depth saliency based on anisotropic center-surround difference. In 2014 IEEE international conference on image processing (ICIP). IEEE, 1115--1119.
[22]
Chongyi Li, Runmin Cong, Junhui Hou, Sanyi Zhang, Yue Qian, and Sam Kwong. 2019. Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 57, 11 (2019), 9156--9166.
[23]
Nianyi Li, Jinwei Ye, Yu Ji, Haibin Ling, and Jingyi Yu. 2014. Saliency detection on light field. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2806--2813.
[24]
Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, and Jianmin Jiang. 2019. A simple pooling-based design for real-time salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3917--3926.
[25]
Nian Liu, Junwei Han, and Ming-Hsuan Yang. 2018. Picanet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3089--3098.
[26]
Zhengyi Liu, Song Shi, Quntao Duan, Wei Zhang, and Peng Zhao. 2019. Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363 (2019), 46--57.
[27]
Georg Elias Müller. 1930. Über die Farbenempfindungen. Joh. Ambr. Barth.
[28]
Georg Elias Muller et al. 1924. Darstellung und Erklarung der verschiedenen Typen der Farbenblindheit nebst Erorterung der Funktion des Stabchenapparates sowie des Farbensinns der Beinen und der Fische. (1924).
[29]
Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 454--461.
[30]
Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. 2014. Rgbd salient object detection: a benchmark and algorithms. In European conference on computer vision. Springer, 92--109.
[31]
Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 733--740.
[32]
Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection. In Proceedings of the IEEE International Conference on Computer Vision. 7254--7263.
[33]
Liangqiong Qu, Shengfeng He, Jiawei Zhang, Jiandong Tian, Yandong Tang, and Qingxiong Yang. 2017. RGBD salient object detection via deep fusion. IEEE Transactions on Image Processing 26, 5 (2017), 2274--2285.
[34]
Jianqiang Ren, Xiaojin Gong, Lu Yu, Wenhui Zhou, and Michael Ying Yang. 2015. Exploiting global priors for RGB-D saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 25--32.
[35]
Abhijit Guha Roy, Nassir Navab, and Christian Wachinger. 2018. Concurrent spatial and channel 'squeeze & excitation'in fully convolutional networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 421--429.
[36]
Hangke Song, Zhi Liu, Huan Du, Guangling Sun, Olivier Le Meur, and Tongwei Ren. 2017. Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Transactions on Image Processing 26, 9 (2017), 4204--4216.
[37]
Ningning Wang and Xiaojin Gong. 2019. Adaptive Fusion for RGB-D Salient Object Detection. IEEE Access 7 (2019), 55277--55284.
[38]
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.
[39]
Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3927--3936.
[40]
Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H Li, and Ge Li. 2019. Pdnet: Prior-model guided depth-enhanced network for salient object detection. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 199--204.
[41]
Chunbiao Zhu and Ge Li. 2017. A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3008--3014

Cited By

View all
  • (2024)MLBSNet: Mutual Learning and Boosting Segmentation Network for RGB-D Salient Object DetectionElectronics10.3390/electronics1314269013:14(2690)Online publication date: 10-Jul-2024
  • (2024)FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681128(8421-8430)Online publication date: 28-Oct-2024
  • (2024)Iterative Saliency Aggregation and Assignment Network for Efficient Salient Object Detection in Optical Remote Sensing ImagesIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.342565862(1-13)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '20: Proceedings of the 28th ACM International Conference on Multimedia
    October 2020
    4889 pages
    ISBN:9781450379885
    DOI:10.1145/3394171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adversarial combination
    2. cross-modal guided attention
    3. rgb-d image
    4. salient object detection

    Qualifiers

    • Research-article

    Funding Sources

    • Open Projects Program of National Laboratory of Pattern Recognition (NLPR)
    • start-up fund of Shenzhen Graduate School of Peking University
    • CCF-Tencent Open Fund
    • National Natural Science Foundation of China
    • Natural Science Foundation of Ningbo
    • Ministry of Science and Technology of China - Science and Technology Innovations 2030
    • Fundamental Research Funds for the Provincial Universities of Zhejiang
    • Shenzhen Science and Technology Plan Basic Research Project
    • Natural Science Foundation of Guangdong Province
    • Open Project Program of the State Key Lab of CAD&CG, Zhejiang University
    • Shenzhen Research Projects

    Conference

    MM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)63
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)MLBSNet: Mutual Learning and Boosting Segmentation Network for RGB-D Salient Object DetectionElectronics10.3390/electronics1314269013:14(2690)Online publication date: 10-Jul-2024
    • (2024)FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681128(8421-8430)Online publication date: 28-Oct-2024
    • (2024)Iterative Saliency Aggregation and Assignment Network for Efficient Salient Object Detection in Optical Remote Sensing ImagesIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.342565862(1-13)Online publication date: 2024
    • (2024)Zoom to Perceive Better: No-Reference Point Cloud Quality Assessment via Exploring Effective Multiscale FeatureIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336236934:7(6334-6346)Online publication date: Jul-2024
    • (2024)Expand, Pool and Confine: Reliably Detaching Salient Objects From the BackgroundIEEE Transactions on Consumer Electronics10.1109/TCE.2024.343035470:3(5353-5362)Online publication date: Aug-2024
    • (2024)Illumination Robust Semantic Segmentation Based on Cross-Dimensional Multispectral Edge Fusion in Dynamic Traffic ScenesIEEE Access10.1109/ACCESS.2024.349889612(171589-171600)Online publication date: 2024
    • (2024)Towards Diverse Binary Segmentation via a Simple yet General Gated NetworkInternational Journal of Computer Vision10.1007/s11263-024-02058-y132:10(4157-4234)Online publication date: 7-May-2024
    • (2024)Introduction to 3D Point Clouds: Datasets and PerceptionDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_1(1-27)Online publication date: 10-Oct-2024
    • (2023)ALFPN: Adaptive Learning Feature Pyramid Network for Small Object DetectionInternational Journal of Intelligent Systems10.1155/2023/62662092023(1-14)Online publication date: 21-Apr-2023
    • (2023)AttentiveNet: Detecting Small Objects for LiDAR Point Clouds by Attending to Important Points2023 IEEE International Conference on Visual Communications and Image Processing (VCIP)10.1109/VCIP59821.2023.10402679(1-5)Online publication date: 4-Dec-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media