research-article

Multi-Scale Cascade Network for Salient Object Detection

Authors:

Leiting ChenAuthors Info & Claims

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 439 - 447

https://doi.org/10.1145/3123266.3123290

Published: 19 October 2017 Publication History

Abstract

In this paper we present a novel network architecture, called Multi-Scale Cascade Network (MSC-Net), to identify the most visually conspicuous objects in an image. Our network consists of several stages (sub-networks) for handling saliency detection across different scales. All these sub-networks form a cascade structure (in a coarse-to-fine manner) where the same underlying convolutional feature representations are fully shared. Compared with existing CNN-based saliency models, the MSC-Net can naturally enable the learning process in the finer cascade stages to encode more global contextual information while progressively incorporating the saliency prior knowledge obtained from coarser stages and thus lead to better detection accuracy. We also design a novel refinement module to further filter out errors by considering the intermediate feedback information. Our MSC-Net is highly integrated, end-to-end trainable, and very powerful. The proposed method achieves state-of-the-art performance on five widely-used salient object detection benchmarks, outperforming existing methods and also maintaining high efficiency. Code and pre-trained models are available at https://github.com/lixin666/MSC-NET.

References

[1]

Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2014. Salient object detection: A survey. arXiv preprint arX- iv:1411.5878 (2014).

[2]

Ali Borji, Simone Frintrop, Dicky N Sihite, and Laurent Itti. 2012. Adaptive object tracking by learning background context. In Computer Vision and Pattern Recognition Workshops. IEEE, 23--30.

[3]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2016. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915 (2016).

[4]

Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H. S. Torr, and Shi-Min Hu. 2015. Global Contrast based Salient Region Detection. IEEE TPAMI 37, 3 (2015), 569--582.

Digital Library

[5]

Karthik Desingh, Madhava Krishna K, Deepu Rajan, and CV Jawahar. 2013. Depth really Matters: Improving Visual Salient Region Detection with Depth. In BMVC.

[6]

Junfeng He, Jinyuan Feng, Xianglong Liu, Tao Cheng, Tai-Hsu Lin, Hyunjin Chung, and Shih-Fu Chang. 2012. Mobile product search with bag of hash bits and boundary reranking. In Computer Vision and Pattern Recognition (CVPR). IEEE, 3005--3012.

Digital Library

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition(CVPR). 770--778.

[8]

Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, and Philip Torr. 2017. Deeply supervised salient object de- tection with short connections. In Computer Vision and Pattern Recognition(CVPR).

[9]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In ACM on Multimedia Conference(ACMMM). ACM, 675--678.

Digital Library

[10]

Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, and Shipeng Li. 2013. Salient Object Detection: A Discriminative Regional Feature Integration Approach. In Computer Vision and Pattern Recognition(CVPR). 2083--2090.

Digital Library

[11]

Andrej Karpathy, Stephen Miller, and Li Fei-Fei. 2013. Object discovery in 3d scenes via shape analysis. In International Conference on Robotics and Automation (ICRA). IEEE, 2088--2095.

[12]

Jaechul Kim and Kristen Grauman. 2012. Shape sharing for object segmentation. European Conference on Computer Vi- sion(ECCV) (2012), 444--458.

Digital Library

[13]

Dominik A Klein and Simone Frintrop. 2011. Center-surround divergence of feature statistics for salient object detection. In International Conference on Computer Vision(ICCV). IEEE, 2214--2219.

Digital Library

[14]

Gayoung Lee, Yu-Wing Tai, and Junmo Kim. 2016. Deep saliency with encoded low level distance map and high level features. In Computer Vision and Pattern Recognition(CVPR).

[15]

Guanbin Li and Yizhou Yu. 2015. Visual saliency based on multiscale deep features. In Computer Vision and Pattern Recog- nition(CVPR). 5455--5463.

[16]

Guanbin Li and Yizhou Yu. 2016. Deep contrast learning for salient object detection. In Computer Vision and Pattern Recog- nition(CVPR). 478--487.

[17]

Xin Li, Fan Yang, Leiting Chen, and Hongbin Cai. 2016. Saliency transfer: an example-based method for salient object detection. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI).

Digital Library

[18]

Xi Li, Liming Zhao, Lina Wei, Ming-Hsuan Yang, Fei Wu, Yueting Zhuang, Haibin Ling, and Jingdong Wang. 2016. DeepSaliency: Multi-task deep neural network model for salient object detection. TIP 25, 8 (2016), 3919--3930.

[19]

Yin Li, Xiaodi Hou, Christof Koch, James M Rehg, and Alan L Yuille. 2014. The secrets of salient object segmentation. In Com- puter Vision and Pattern Recognition(CVPR). 280--287.

Digital Library

[20]

Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. 2016. RefineNet: Multi-Path Refinement Networks with Identity Map- pings for High-Resolution Semantic Segmentation. arXiv preprint arXiv:1611.06612 (2016).

[21]

Nian Liu and Junwei Han. 2016. DHSNet: Deep hierarchical saliency network for salient object detection. In Computer Vision and Pattern Recognition(CVPR). 678--686.

[22]

Tie Liu, Zejian Yuan, Jian Sun, Jingdong Wang, Nanning Zheng, Xiaoou Tang, and Heung-Yeung Shum. 2011. Learning to detect a salient object. TPAMI 33, 2 (2011), 353--367.

Digital Library

[23]

Wei Liu, Andrew Rabinovich, and Alexander C Berg. 2015. Parsenet: Looking wider to see better. arXiv preprint arX- iv:1506.04579 (2015).

[24]

Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. 2014. How to evaluate foreground maps?. In Computer Vision and Pattern Recognition(CVPR). 248--255.

Digital Library

[25]

Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In Computer Vision and Pattern Recognition (CVPR). IEEE, 454--461.

Digital Library

[26]

Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In Computer Vision and Pattern Recognition(CVPR). IEEE, 733--740.

Digital Library

[27]

Paul L Rosin and Yu-Kun Lai. 2013. Artistic minimal rendering with lines and blocks. Graphical Models 75, 4 (2013), 208--229.

Digital Library

[28]

Ueli Rutishauser, Dirk Walther, Christof Koch, and Pietro Perona. 2004. Is bottom-up attention useful for object recognition?. In Computer Vision and Pattern Recognition(CVPR).

Digital Library

[29]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations(ICLR).

[30]

Youbao Tang and Xiangqian Wu. 2016. Saliency Detection via Combining Region-Level and Pixel-Level Predictions with CNNs. In European Conference on Computer Vision(ECCV. Springer, 809--825.

[31]

Lijun Wang, Huchuan Lu, Xiang Ruan, and Ming-Hsuan Yang. 2015. Deep networks for saliency detection via local estimation and global search. In Computer Vision and Pattern Recognition(CVPR). 3183--3192.

[32]

Linzhao Wang, Lijun Wang, Huchuan Lu, Pingping Zhang, and Xiang Ruan. 2016. Saliency detection with recurrent fully convolutional networks. In European Conference on Computer Vision(ECCV. Springer, 825--841.

[33]

Yulin Xie, Huchuan Lu, and Ming-Hsuan Yang. 2013. Bayesian saliency via low and mid level cues. TIP 22, 5 (2013), 1689--1698.

Digital Library

[34]

Chuan Yang, Lihe Zhang, Huchuan Lu, Ruan Xiang, and Ming Hsuan Yang. 2013. Saliency Detection via Graph-Based Manifold Ranking. In Computer Vision and Pattern Recognition(CVPR). 3166--3173.

Digital Library

[35]

Fan Yang, Xin Li, Hong Cheng, Jianping Li, and Leiting Chen. 2017. Object-Aware Dense Semantic Correspondence. In Computer Vision and Pattern Recognition(CVPR).

[36]

Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions. In International Conference on Learning Representations(ICLR).

[37]

Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, and Radomír Mech. 2015. Minimum Barrier Salient Object Detection at 80 FPS. In IEEE International Conference on Computer Vision(ICCV).

Digital Library

[38]

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2016. Pyramid Scene Parsing Network. arXiv preprint arXiv:1612.01105 (2016).

[39]

Rui Zhao, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. 2015. Saliency detection by multi-context deep learning. In Computer Vision and Pattern Recognition(CVPR). 1265--1274.

[40]

Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2014. Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856 (2014).

[41]

Wangjiang Zhu, Shuang Liang, Yichen Wei, and Jian Sun. 2014. Saliency optimization from robust background detection. In Computer Vision and Pattern Recognition(CVPR). 2814--2821.

Digital Library

Cited By

Zhang LChen LZhou CLi XYang FYi Z(2024)Weighted Graph-Structured Semantics Constraint Network for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.328289426(1551-1564)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3282894
Chen LCao TZheng YWang YZhang BYang J(2024)Enhancing learning on uncertain pixels in self-distillation for object segmentationComplex & Intelligent Systems10.1007/s40747-024-01519-810:5(6545-6557)Online publication date: 15-Jun-2024
https://doi.org/10.1007/s40747-024-01519-8
Zhao YZhao LYu QSheng LZhang JXu DEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Distortion-aware Transformer in 360° Salient Object DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612025(499-508)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612025
Show More Cited By

Index Terms

Multi-Scale Cascade Network for Salient Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Fixation guided network for salient object detection
MMAsia '20: Proceedings of the 2nd ACM International Conference on Multimedia in Asia

Convolutional neural network (CNN) based salient object detection (SOD) has achieved great development in recent years. However, in some challenging cases, i.e. small-scale salient object, low contrast salient object and cluttered background, existing ...
Salient object detection: From pixels to segments

In this paper we propose a novel approach to the task of salient object detection. In contrast to previous salient object detectors that are based on a spotlight attention theory, we follow an object-based attention theory and incorporate the notion of ...
Salient object detection via multiple saliency weights

Salient object detection aims to emulate the extraordinary capability of human visual system, which has the ability to find the most visually attractive objects in a complex visual scene. The human visual attention is often complicated and affected by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '17: Proceedings of the 25th ACM international conference on Multimedia

October 2017

2028 pages

ISBN:9781450349062

DOI:10.1145/3123266

General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Chinese National Programs for High Technology Research and Development( 863 program)
MSTP of Dongguan
National Nature Science Foundation of China

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23 - 27, 2017

California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
658
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang LChen LZhou CLi XYang FYi Z(2024)Weighted Graph-Structured Semantics Constraint Network for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.328289426(1551-1564)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3282894
Chen LCao TZheng YWang YZhang BYang J(2024)Enhancing learning on uncertain pixels in self-distillation for object segmentationComplex & Intelligent Systems10.1007/s40747-024-01519-810:5(6545-6557)Online publication date: 15-Jun-2024
https://doi.org/10.1007/s40747-024-01519-8
Zhao YZhao LYu QSheng LZhang JXu DEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Distortion-aware Transformer in 360° Salient Object DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612025(499-508)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612025
Zhai QLi XYang FJiao ZLuo PCheng HLiu Z(2023)MGL: Mutual Graph Learning for Camouflaged Object DetectionIEEE Transactions on Image Processing10.1109/TIP.2022.322321632(1897-1910)Online publication date: 2023
https://doi.org/10.1109/TIP.2022.3223216
Yang FLi XShen J(2023)Nested Architecture Search for Point Cloud Semantic SegmentationIEEE Transactions on Image Processing10.1109/TIP.2022.314798332(2889-2900)Online publication date: 2023
https://doi.org/10.1109/TIP.2022.3147983
Zhang QZhao RZhang L(2023)TCRNet: A Trifurcated Cascaded Refinement Network for Salient Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.319978033:1(298-311)Online publication date: Jan-2023
https://doi.org/10.1109/TCSVT.2022.3199780
Li XYang FLuo AJiao ZCheng HLiu Z(2022)EFRNet: Efficient Feature Reconstructing Network for Real-Time Scene ParsingIEEE Transactions on Multimedia10.1109/TMM.2021.308942224(2852-2865)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TMM.2021.3089422
Qi QWang XHou TYan YWang H(2022)FastVOD-Net: A Real-Time and High-Accuracy Video Object DetectorIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.317672123:11(20926-20942)Online publication date: Nov-2022
https://doi.org/10.1109/TITS.2022.3176721
Zhong YLi BTang LKuang SWu SDing S(2022)Detecting Camouflaged Object in Frequency Domain2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00446(4494-4503)Online publication date: Jun-2022
https://doi.org/10.1109/CVPR52688.2022.00446
Salari ADjavadifar ALiu XNajjaran H(2022)Object recognition datasets and challenges: A reviewNeurocomputing10.1016/j.neucom.2022.01.022495(129-152)Online publication date: Jul-2022
https://doi.org/10.1016/j.neucom.2022.01.022
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten