Abstract
Recent advances in supervised salient object detection modeling has resulted in significant performance improvements on benchmark datasets. However, most of the existing salient object detection models assume that at least one salient object exists in the input image. Such an assumption often leads to less appealing saliency maps on the background images with no salient object at all. Therefore, handling those cases can reduce the false positive rate of a model. In this paper, we propose a supervised learning approach for jointly addressing the salient object detection and existence prediction problems. Given a set of background-only images and images with salient objects, as well as their salient object annotations, we adopt the structural SVM framework and formulate the two problems jointly in a single integrated objective function: saliency labels of superpixels are involved in a classification term conditioned on the salient object existence variable, which in turn depends on both global image and regional saliency features and saliency labels assignments. The loss function also considers both image-level and region-level mis-classifications. Extensive evaluation on benchmark datasets validate the effectiveness of our proposed joint approach compared to the baseline and state-of-the-art models.
Similar content being viewed by others
References
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, 1998, 20(11): 1254–1259
Borji A, Itti L. State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(1): 185–207
Borji A, Sihite D N, Itti L. Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Transactions on Image Processing, 2013, 22(1): 55–69
Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum H Y. Learning to detect a salient object. IEEE Transactions on Pattern analysis and machine intelligence, 2011, 33(2): 353–367
Zhang G X, Cheng M M, Hu S M, Martin R R. A shape-preserving approach to image resizing. Computer Graphics Forum, 2009, 28(7): 1897–1906
Chen T, Cheng MM, Tan P, Shamir A, Hu S M. Sketch2photo: Internet image montage. ACM Transactions on Graphics (TOG), 2009, 28(5): 124
Chen T, Tan P, Ma L Q, Cheng M M, Shamir A, Hu S M. Poseshop: human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(5): 824–837
Cheng M M, Mitra N J, Huang X, Hu S M. Salientshape: group saliency in image collections. The Visual Computer, 2014, 30(4): 443–453
Wang J, Quan L, Sun J, Tang X, Shum H Y. Picture collage. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2006, 347–354
Abdulmunem A, Lai Y K, Sun X. Saliency guided local and global descriptors for effective action recognition. Computational Visual Media, 2016, 2(1): 97–106
Zhang J, Han Y, Jiang J. Tucker decomposition-based tensor learning for human action recognition. Multimedia Systems, 2016, 22(3): 343–353
Hu S M, Chen T, Xu K, Cheng M M, Martin R R. Internet visual media processing: a survey with graphics and vision applications. The Visual Computer, 2013, 29(5): 393–405
Cheng M M, Hou Q B, Zhang S H, Rosin P L. Intelligent visual media processing: when graphics meets vision. Journal of Computer Science and Technology, 2017, 32(1): 110–121
Jiang H, Wang J, Yuan Z, Wu Y, Zheng N, Li S. Salient object detection: a discriminative regional feature integration approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2083–2090
Zhao R, Ouyang W, Li H, Wang X. Saliency detection by multi-context deep learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1265–1274
Li G, Yu Y. Visual saliency based on multiscale deep features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 5455–5463
Perazzi F, Krähenbühl P, Pritch Y, Hornung A. Saliency filters: contrast based filtering for salient region detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 733–740
Zhu W, Liang S, Wei Y, Sun J. Saliency optimization from robust background detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 2814–2821
Li X, Lu H, Zhang L, Ruan X, Yang M H. Saliency detection via dense and sparse reconstruction. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 2976–2983
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems Conference. 2012, 1106–1114
Borji A. What is a salient object? a dataset and a baseline model for salient object detection. IEEE Transactions on Image Processing, 2015, 24(2): 742–756
Wang P, Wang J, Zeng G, Feng J, Zha H, Li S. Salient object detection for searched web images via global saliency. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 3194–3201.
Boykov Y, Kolmogorov V. An experimental comparison of mincut/ max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(9): 1124–1137
Borji A, Cheng M M, Hou Q, Jiang H, Li J. Salient object detection: a survey. 2014, arXiv preprint arXiv:1411.5878
Borji A, Cheng M M, Jiang H, Li J. Salient object detection: a benchmark. IEEE Transactions on Image Processing, 2015, 24(12): 5706–5722
Han J, Liu N, Zhang D. Visual saliency detection and applications: a survey. Frontiers of Computer Science, 2017
Achanta R, Hemami S, Estrada F, Süsstrunk S. Frequency-tuned salient region detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1597–1604
Goferman S, Zelnik-Manor L, Tal A. Context-aware saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(10): 1915–1926
Tian Y, Li J, Yu S, Huang T. Learning complementary saliency priors for foreground object segmentation in complex scenes. International Journal of Computer Vision, 2015, 111(2): 153–170
Fang S, Li J, Tian Y, Huang T, Chen X. Learning discriminative subspaces on random contrasts for image saliency analysis. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(5): 1095–1108
Margolin R, Tal A, Zelnik-Manor L. What makes a patch distinct? In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 1139–1146
Cheng M M, Mitra N J, Huang X, Torr P H, Hu S M. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 569–582
Borji A, Itti L. Exploiting local and global patch rarities for saliency detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 478–485
Qi W, Cheng M M, Borji A, Lu H, Bai L F. SaliencyRank: two-stage manifold ranking for salient object detection. Computational Visual Media, 2015, 1(4): 309–320
Jiang H, Wang J, Yuan Z, Liu T, Zheng N, Li S. Automatic salient object segmentation based on context and shape prior. In: Proceedings of the British Machine Vision Conference (BMVC). 2011
Felzenszwalb P F, Huttenlocher D P. Efficient graph-based image segmentation. International Journal of Computer Vision, 2004, 59(2): 16–181
Cheng M M, Liu Y, Hou Q, Bian J, Torr P, Hu S M, Tu Z. HFS: hierarchical feature selection for efficient image segmentation. In: Proceedings of European Conference on Computer Vision. 2016, 867–882
Yan Q, Xu L, Shi J, Jia J. Hierarchical saliency detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 1155–1162
Wei Y, Wen F, Zhu W, Sun J. Geodesic saliency using background priors. In: Proceedings of European Conference on Computer Vision. 2012, 29–42
Yang C, Zhang L, Lu H, Ruan X, Yang M H. Saliency detection via graph-based manifold ranking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 3166–3173
Jiang B, Zhang L, Lu H, Yang C, Yang M H. Saliency detection via absorbing Markov chain. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 1665–1672
Zhang J, Sclaroff S. Saliency detection: a boolean map approach. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 153–160
Chang K Y, Liu T L, Chen H T, Lai S H. Fusing generic objectness and visual saliency for salient object detection. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 914–921
Jiang P, LingH, Yu J, Peng J. Salient region detection by UFO: uniqueness, focusness and objectness. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 1976–1983
Jia Y, Han M. Category-independent object-level saliency detection. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 1761–1768
Cheng MM, Warrell J, Lin WY, Zheng S, Vineet V, Crook N. Efficient salient region detection with soft image abstraction. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 1529–1536
Mai L, Niu Y, Liu F. Saliency aggregation: a data-driven approach. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 1131–1138
Lu S, Mahadevan V, Vasconcelos N. Learning optimal seeds for diffusion-based salient object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 2790–2797
Mehrani P, Veksler O. Saliency segmentation based on learning and graph cut refinement. In: Proceedings of the British Machine Vision Conference (BMVC). 2010, 1–12
Kim J, Han D, Tai Y W, Kim J. Salient region detection via highdimensional color transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 883–890
Khuwuthyakorn P, Robles-Kelly A, Zhou J. Object of interest detection by saliency learning. In: Proceedings of European Conference on Computer Vision. 2010
Hou Q, Cheng M M, Hu X, Borji A, Tu Z, Torr P. Deeply supervised salient object detection with short connections. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2017, 5300–5309
Zhang J, Ma S, Sameki M, Sclaroff S, Betke M, Lin Z, Shen X, Price B, Mech R. Salient object subitizing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4045–4054
Deng J, DongW, Socher R, Li L J, Li K, Li F F. Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 580–587
Yang S, Luo P, Loy C C, Tang X. From facial parts responses to face detection: a deep learning approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3676–3684
Cimpoi M, Maji S, Vedaldi A. Deep filter banks for texture recognition and segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3828–3836
Lin T Y, RoyChowdhury A, Maji S. Bilinear CNN models for finegrained visual recognition. In: Proceedings of IEEE International Conference on Computer Vision. 2015, 1449–1457
Su H, Maji S, Kalogerakis E, Learned-Miller E. Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of IEEE International Conference on Computer Vision. 2015, 945–953
Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision. 2014, 818–833
Do T M T, Artières T. Regularized bundle methods for convex and non-convex risks. Journal of Machine Learning Research, 2012, 13: 3539–3583
Xiao J, Hays J, Ehinger K A, Oliva A, Torralba A. SUN database: large-scale scene recognition from abbey to zoo. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 3485–3492
Cimpoi M, Maji S, Kokkinos I, Mohamed S, Vedaldi A. Describing textures in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 3606–3613
Shen X, Wu Y. A unified approach to salient object detection via low rank matrix recovery. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 853–860
Huang H, Zhang L, Zhang H C. Arcimboldo-like collage using internet images. ACM Transactions on Graphics, 2011, 30(6): 155
Liu H, Zhang L, Huang H. Web-image driven best views of 3D shapes. The Visual Computer, 2012, 28(3): 279–287
Wei Y, Liang X, Chen Y, Shen X, Cheng M M, Feng J, Zhao Y, Yan S. STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2314–2320
Chia A Y S, Zhuo S, Gupta R K, Tai Y W, Cho S Y, Tan P, Lin S. Semantic colorization with Internet images. ACM Transactions on Graphics, 2011, 30(6): 156
Acknowledgements
This research was supported by the National Natural Science Foundation of China (Grant Nos. 61572264, 61620106008) and CAST young talents plan.
Author information
Authors and Affiliations
Corresponding author
Additional information
Huaizu Jiang is currently a PhD student in College of Information and Computer Sciences, University of Massachusetts, Amherst, USA. He received his BS and MS degrees from Xi’an Jiaotong University, China in 2005 and 2009, respectively. He is interested in how to teach an intelligent machine to understand the visual scene like a human.
Ming-Ming Cheng received his PhD degree from Tsinghua University, China in 2012. Then, he became a research fellow, with Prof. Philip Torr in the University of Oxford, UK. He is now an associate professor at Nankai University, China. His research interests include computer graphics, computer vision, and image processing.
Shi-Jie Li received his BS degree from University of Electronic Science and Technology of China, China in 2016. He is now a master student in department of computer science, Nankai University, China, working with Prof. Ming-Ming Cheng.
Ali Borji received his BS and MS degrees in computer engineering from Petroleum University of Technology, Tehran, Iran in 2001 and Shiraz University, Shiraz, Iran in 2004, respectively. He did his PhD in cognitive neurosciences at Institute for Studies in Fundamental Sciences (IPM) in Tehran, Iran, 2009 and spent four years as a postdoctoral scholar at iLab, University of Southern California, USA from 2010 to 2014. He is currently an assistant professor at University of Central Floria, Orlando, USA. His research interests include visual attention, active learning, object and scene recognition, and cognitive and computational neurosciences.
Jingdong Wang received the PhD degree in computer science from The Hong Kong University of Science and Technology, China in 2007. He is currently a Lead Researcher with the Internet Media Group, Microsoft Research, Beijing, China. His current research interests include computer vision, machine learning, and multimedia. He has served as an Area Chair in CVPR 2017, ECCV 2016, ACMMM 2015, and ICME 2015, a Track Chair in ICME 2012. He is an Editorial Board Member of the IEEE TRANSACTIONS ON MULTIMEDIA and the International Journal of Multimedia Tools and Applications and an Associate Editor of the International Journal of Neurocomputing. He has shipped 10+ technologies to Microsoft products, including XiaoIce Chatbot, Microsoft cognitive service, and Bing search.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Jiang, H., Cheng, MM., Li, SJ. et al. Joint salient object detection and existence prediction. Front. Comput. Sci. 13, 778–788 (2019). https://doi.org/10.1007/s11704-017-6613-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-017-6613-8