skip to main content
10.1145/3123266.3123290acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-Scale Cascade Network for Salient Object Detection

Published: 19 October 2017 Publication History

Abstract

In this paper we present a novel network architecture, called Multi-Scale Cascade Network (MSC-Net), to identify the most visually conspicuous objects in an image. Our network consists of several stages (sub-networks) for handling saliency detection across different scales. All these sub-networks form a cascade structure (in a coarse-to-fine manner) where the same underlying convolutional feature representations are fully shared. Compared with existing CNN-based saliency models, the MSC-Net can naturally enable the learning process in the finer cascade stages to encode more global contextual information while progressively incorporating the saliency prior knowledge obtained from coarser stages and thus lead to better detection accuracy. We also design a novel refinement module to further filter out errors by considering the intermediate feedback information. Our MSC-Net is highly integrated, end-to-end trainable, and very powerful. The proposed method achieves state-of-the-art performance on five widely-used salient object detection benchmarks, outperforming existing methods and also maintaining high efficiency. Code and pre-trained models are available at https://github.com/lixin666/MSC-NET.

References

[1]
Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2014. Salient object detection: A survey. arXiv preprint arX- iv:1411.5878 (2014).
[2]
Ali Borji, Simone Frintrop, Dicky N Sihite, and Laurent Itti. 2012. Adaptive object tracking by learning background context. In Computer Vision and Pattern Recognition Workshops. IEEE, 23--30.
[3]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2016. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915 (2016).
[4]
Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H. S. Torr, and Shi-Min Hu. 2015. Global Contrast based Salient Region Detection. IEEE TPAMI 37, 3 (2015), 569--582.
[5]
Karthik Desingh, Madhava Krishna K, Deepu Rajan, and CV Jawahar. 2013. Depth really Matters: Improving Visual Salient Region Detection with Depth. In BMVC.
[6]
Junfeng He, Jinyuan Feng, Xianglong Liu, Tao Cheng, Tai-Hsu Lin, Hyunjin Chung, and Shih-Fu Chang. 2012. Mobile product search with bag of hash bits and boundary reranking. In Computer Vision and Pattern Recognition (CVPR). IEEE, 3005--3012.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition(CVPR). 770--778.
[8]
Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, and Philip Torr. 2017. Deeply supervised salient object de- tection with short connections. In Computer Vision and Pattern Recognition(CVPR).
[9]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In ACM on Multimedia Conference(ACMMM). ACM, 675--678.
[10]
Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, and Shipeng Li. 2013. Salient Object Detection: A Discriminative Regional Feature Integration Approach. In Computer Vision and Pattern Recognition(CVPR). 2083--2090.
[11]
Andrej Karpathy, Stephen Miller, and Li Fei-Fei. 2013. Object discovery in 3d scenes via shape analysis. In International Conference on Robotics and Automation (ICRA). IEEE, 2088--2095.
[12]
Jaechul Kim and Kristen Grauman. 2012. Shape sharing for object segmentation. European Conference on Computer Vi- sion(ECCV) (2012), 444--458.
[13]
Dominik A Klein and Simone Frintrop. 2011. Center-surround divergence of feature statistics for salient object detection. In International Conference on Computer Vision(ICCV). IEEE, 2214--2219.
[14]
Gayoung Lee, Yu-Wing Tai, and Junmo Kim. 2016. Deep saliency with encoded low level distance map and high level features. In Computer Vision and Pattern Recognition(CVPR).
[15]
Guanbin Li and Yizhou Yu. 2015. Visual saliency based on multiscale deep features. In Computer Vision and Pattern Recog- nition(CVPR). 5455--5463.
[16]
Guanbin Li and Yizhou Yu. 2016. Deep contrast learning for salient object detection. In Computer Vision and Pattern Recog- nition(CVPR). 478--487.
[17]
Xin Li, Fan Yang, Leiting Chen, and Hongbin Cai. 2016. Saliency transfer: an example-based method for salient object detection. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI).
[18]
Xi Li, Liming Zhao, Lina Wei, Ming-Hsuan Yang, Fei Wu, Yueting Zhuang, Haibin Ling, and Jingdong Wang. 2016. DeepSaliency: Multi-task deep neural network model for salient object detection. TIP 25, 8 (2016), 3919--3930.
[19]
Yin Li, Xiaodi Hou, Christof Koch, James M Rehg, and Alan L Yuille. 2014. The secrets of salient object segmentation. In Com- puter Vision and Pattern Recognition(CVPR). 280--287.
[20]
Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. 2016. RefineNet: Multi-Path Refinement Networks with Identity Map- pings for High-Resolution Semantic Segmentation. arXiv preprint arXiv:1611.06612 (2016).
[21]
Nian Liu and Junwei Han. 2016. DHSNet: Deep hierarchical saliency network for salient object detection. In Computer Vision and Pattern Recognition(CVPR). 678--686.
[22]
Tie Liu, Zejian Yuan, Jian Sun, Jingdong Wang, Nanning Zheng, Xiaoou Tang, and Heung-Yeung Shum. 2011. Learning to detect a salient object. TPAMI 33, 2 (2011), 353--367.
[23]
Wei Liu, Andrew Rabinovich, and Alexander C Berg. 2015. Parsenet: Looking wider to see better. arXiv preprint arX- iv:1506.04579 (2015).
[24]
Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. 2014. How to evaluate foreground maps?. In Computer Vision and Pattern Recognition(CVPR). 248--255.
[25]
Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In Computer Vision and Pattern Recognition (CVPR). IEEE, 454--461.
[26]
Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In Computer Vision and Pattern Recognition(CVPR). IEEE, 733--740.
[27]
Paul L Rosin and Yu-Kun Lai. 2013. Artistic minimal rendering with lines and blocks. Graphical Models 75, 4 (2013), 208--229.
[28]
Ueli Rutishauser, Dirk Walther, Christof Koch, and Pietro Perona. 2004. Is bottom-up attention useful for object recognition?. In Computer Vision and Pattern Recognition(CVPR).
[29]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations(ICLR).
[30]
Youbao Tang and Xiangqian Wu. 2016. Saliency Detection via Combining Region-Level and Pixel-Level Predictions with CNNs. In European Conference on Computer Vision(ECCV. Springer, 809--825.
[31]
Lijun Wang, Huchuan Lu, Xiang Ruan, and Ming-Hsuan Yang. 2015. Deep networks for saliency detection via local estimation and global search. In Computer Vision and Pattern Recognition(CVPR). 3183--3192.
[32]
Linzhao Wang, Lijun Wang, Huchuan Lu, Pingping Zhang, and Xiang Ruan. 2016. Saliency detection with recurrent fully convolutional networks. In European Conference on Computer Vision(ECCV. Springer, 825--841.
[33]
Yulin Xie, Huchuan Lu, and Ming-Hsuan Yang. 2013. Bayesian saliency via low and mid level cues. TIP 22, 5 (2013), 1689--1698.
[34]
Chuan Yang, Lihe Zhang, Huchuan Lu, Ruan Xiang, and Ming Hsuan Yang. 2013. Saliency Detection via Graph-Based Manifold Ranking. In Computer Vision and Pattern Recognition(CVPR). 3166--3173.
[35]
Fan Yang, Xin Li, Hong Cheng, Jianping Li, and Leiting Chen. 2017. Object-Aware Dense Semantic Correspondence. In Computer Vision and Pattern Recognition(CVPR).
[36]
Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions. In International Conference on Learning Representations(ICLR).
[37]
Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, and Radomír Mech. 2015. Minimum Barrier Salient Object Detection at 80 FPS. In IEEE International Conference on Computer Vision(ICCV).
[38]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2016. Pyramid Scene Parsing Network. arXiv preprint arXiv:1612.01105 (2016).
[39]
Rui Zhao, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. 2015. Saliency detection by multi-context deep learning. In Computer Vision and Pattern Recognition(CVPR). 1265--1274.
[40]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2014. Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856 (2014).
[41]
Wangjiang Zhu, Shuang Liang, Yichen Wei, and Jian Sun. 2014. Saliency optimization from robust background detection. In Computer Vision and Pattern Recognition(CVPR). 2814--2821.

Cited By

View all
  • (2024)Weighted Graph-Structured Semantics Constraint Network for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.328289426(1551-1564)Online publication date: 1-Jan-2024
  • (2024)Enhancing learning on uncertain pixels in self-distillation for object segmentationComplex & Intelligent Systems10.1007/s40747-024-01519-810:5(6545-6557)Online publication date: 15-Jun-2024
  • (2023)Distortion-aware Transformer in 360° Salient Object DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612025(499-508)Online publication date: 26-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computer vision
  2. deep learning
  3. salient object detection

Qualifiers

  • Research-article

Funding Sources

  • Chinese National Programs for High Technology Research and Development( 863 program)
  • MSTP of Dongguan
  • National Nature Science Foundation of China

Conference

MM '17
Sponsor:
MM '17: ACM Multimedia Conference
October 23 - 27, 2017
California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Weighted Graph-Structured Semantics Constraint Network for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.328289426(1551-1564)Online publication date: 1-Jan-2024
  • (2024)Enhancing learning on uncertain pixels in self-distillation for object segmentationComplex & Intelligent Systems10.1007/s40747-024-01519-810:5(6545-6557)Online publication date: 15-Jun-2024
  • (2023)Distortion-aware Transformer in 360° Salient Object DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612025(499-508)Online publication date: 26-Oct-2023
  • (2023)MGL: Mutual Graph Learning for Camouflaged Object DetectionIEEE Transactions on Image Processing10.1109/TIP.2022.322321632(1897-1910)Online publication date: 2023
  • (2023)Nested Architecture Search for Point Cloud Semantic SegmentationIEEE Transactions on Image Processing10.1109/TIP.2022.314798332(2889-2900)Online publication date: 2023
  • (2023)TCRNet: A Trifurcated Cascaded Refinement Network for Salient Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.319978033:1(298-311)Online publication date: Jan-2023
  • (2022)EFRNet: Efficient Feature Reconstructing Network for Real-Time Scene ParsingIEEE Transactions on Multimedia10.1109/TMM.2021.308942224(2852-2865)Online publication date: 1-Jan-2022
  • (2022)FastVOD-Net: A Real-Time and High-Accuracy Video Object DetectorIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.317672123:11(20926-20942)Online publication date: Nov-2022
  • (2022)Detecting Camouflaged Object in Frequency Domain2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00446(4494-4503)Online publication date: Jun-2022
  • (2022)Object recognition datasets and challenges: A reviewNeurocomputing10.1016/j.neucom.2022.01.022495(129-152)Online publication date: Jul-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media