Abstract
Salient object detection is an efficient preprocessing technique to deal with binary segmentation task. Existing works based on deep learning have achieved an enormous leap forward with outstanding performance in the field of computer vision. Most of the previous methods mainly adopted multi-scale fusion and attention mechanisms to facilitate efficient feature extraction yet ignored necessary global context characteristics and general models computational limitation. To mitigate the adverse effects of feature dilution during the top-to-down transmission, we propose a cross channel aggregation similarity network (CCANet) with three modules. Cross channel aggregation module retains high-response channels from integrated different layer feature maps to extract efficient global context information. Similarity fusion module calculates the similarity among various features consisting of high-level semantic, low-level spatial, and global context information to enhance the complementary of maps. Dense residual module extracts denser features under multi-scale receptive fields to improve the density of prediction maps. Besides, a combined loss function with modified weighted binary cross-entropy is applied to alleviate the class imbalance issue incurred in the training process. Benefited from the overall harmonious design, experimental results show that CCANet achieves state-of-the-art performance on six public benchmark datasets. Without any post-processing operations, it runs real-time inference at a speed of around 32 FPS when processing a 320 \(\times\) 320 image.
Similar content being viewed by others
References
Achanta R, Hemami SS, Estrada FJ, Süsstrunk S (2009) Frequency-tuned salient region detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1597–1604
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, vol. 11211, pp 833–851
Chen Z, Xu Q, Cong R, Huang Q (2020) Global context-aware progressive aggregation network for salient object detection. In: The Thirty-Fourth AAAI conference on artificial intelligence, the thirty-second innovative applications of artificial intelligence conference,the tenth AAAI symposium on educational advances in artificial intelligence, pp 10599–10606
Cheng M, Mitra NJ, Huang X, Torr PHS, Hu S (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582
Deng R, Shen C, Liu S, Wang H, Liu X (2018) Learning to predict crisp boundaries. In: Computer vision European conference. Lecture Notes in Computer Science, vol 11210, pp 570–586
Everingham M, Gool LV, Williams CKI, Winn JM, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Fan D, Cheng M, Liu J, Gao S, Hou Q, Borji A (2018) Salient objects in clutter: Bringing salient object detection to the foreground. In: Computer vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, proceedings, Part XV, pp 196–212
Fan D, Gong C, Cao Y, Ren B, Cheng M, Borji A (2018) Enhanced-alignment measure for binary foreground map evaluation. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, July 13–19, 2018, Stockholm, Sweden, pp 698–704
Feng M, Lu H, Ding E (2019) Attentive feedback network for boundary-aware salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1623–1632
Gao SH, Cheng MM, Zhao K, Zhang XY, Yang MH, Torr P (2020) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell:1–1
Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inf Sci 517:52–67
Goodfellow IJ, Warde-Farley D, Mirza M, Courville AC, Bengio Y (2013) Maxout networks. Proc Int Conf Mach Learn 28:1319–1327
Guanbin L, Yizhou Y (2016) Visual saliency detection based on multiscale deep CNN features. IEEE Trans Image Process 25(11):5012–5024
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Hou Q, Cheng M, Hu X, Borji A, Tu Z, Torr PHS (2019) Deeply supervised salient object detection with short connections. IEEE Trans Pattern Anal Mach Intell 41(4):815–828
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp 1106–1114
Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5455–5463
Li Y, Hou X, Koch C, Rehg JM, Yuille AL (2014) The secrets of salient object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 280–287
Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 936–944
Liu C, Chen L, Schroff F, Adam H, Hua W, Yuille AL, Li F (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 82–92
Liu J, Hou Q, Cheng M, Feng J, Jiang J (2019) A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3917–3926
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Luo Z, Mishra AK, Achkar A, Eichel JA, Li S, Jodoin P (2017) Non-local deep features for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6593–6601
Milletari F, Navab N, Ahmadi S (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth international conference on 3D Vision, 3DV, pp 565–571
Movahedi V, Elder JH (2010) Design and perceptual validation of performance measures for salient object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–56
Pan J, Sayrol E, Giró-i-Nieto X, McGuinness K, O’Connor NE (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 598–606
Pang Y, Zhao X, Zhang L, Lu H (2020) Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9410–9419
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems 32: annual conference on neural information processing systems, pp 8024–8035
Qin X, Zhang ZV, Huang C, Gao C, Dehghan M, Jägersand M (2019) Basnet: boundary-aware salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7479–7489
Ren G, Dai T, Barmpoutis P, Stathaki T (2020) Salient object detection combining a self-attention module and a feature pyramid network. CoRR arXiv:2004.14552
Wang L, Lu H, Ruan X, Yang (2015) Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3183–3192
Wang L, Wang L, Lu H, Zhang P, Ruan X (2016) Saliency detection with recurrent fully convolutional networks. In: Computer vision european conference, vol 9908, pp 825–841
Wang L, Lu H, Wang Y, Feng M, Wang D, Yin B, Ruan X (2017) Learning to detect salient objects with image-level supervision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3796–3805
Wang X, Girshick RB, Gupta A, He K (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition, pp 7794–7803
Wang W, Zhao S, Shen J, Hoi SCH, Borji A (2019) Salient object detection with pyramid attention and salient edges. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1448–1457
Wei J, Wang S, Huang Q (2019) F3net: Fusion, feedback and focus for salient object detection. CoRR arXiv:1911.11445
Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, vol 11211, pp 3–19
Wu Z, Su L, Huang Q (2019) Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3907–3916
Yan Q, Xu L, Shi J, Jia J (2013) Hierarchical saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1155–1162
Yang C, Zhang L, Lu H, Ruan X, Yang M (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3166–3173
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4651–4659
Zhang P, Wang D, Lu H, Wang H, Ruan X (2017) Amulet: aggregating multi-level convolutional features for salient object detection. In: IEEE international conference on computer vision, pp 202–211
Zhang X, Wang T, Qi J, Lu H, Wang G (2018) Progressive attention guided recurrent network for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 714–722
Zhang Q, Shi Y, Zhang X (2020) Attention and boundary guided salient object detection. Pattern Recogn 107:107484
Zhang S, Zhao W, Guan Z, Peng X, Peng J (2021) Keypoint-graph-driven learning framework for object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1065–1073
Zhao W, Guan Z, Luo H, Peng J, Fan J (2017) Deep multiple instance hashing for object-based image retrieval. In: IJCAI, pp 3504–3510
Zhao J, Liu J, Fan D, Cao Y, Yang J, Cheng M (2019) Egnet: Edge guidance network for salient object detection. In: 2019 IEEE/CVF international conference on computer vision, pp 8778–8787
Zhao W, Zhang S, Guan Z, Luo H, Tang L, Peng J, Fan J (2020) 6d object pose estimation via viewpoint relation reasoning. Neurocomputing 389:9–17
Zhao X, Pang Y, Zhang L, Lu H, Zhang L (2020) Suppress and balance: a simple gated network for salient object detection. In: Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II, vol 12347, pp 35–51
Zhao W, Zhang S, Guan Z, Zhao W, Peng J, Fan J (2020) Learning deep network for detecting 3d object keypoints and 6d poses. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14134–14142
Zheng Z, Yang X, Yu Z, Zheng L, Yang Y, Kautz J (2019) Joint discriminative and generative learning for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2138–2147
Zhou H, Xie X, Lai J, Chen Z, Yang L (2020) Interactive two-stream decoder for accurate and fast saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9138–9147
Acknowledgements
This work was supported by National Science Foundation of China under Grant nos. 61672467, 61877055, 61976195 and 61902358, as well as in part by the Natural Science Foundation of Zhejiang under Grant LY18F030013, Basic Science and Technology Research of Heilongjiang under Grant 2020-KYYWFMY-0065.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, L., Liu, H., Mo, J. et al. Cross channel aggregation similarity network for salient object detection. Int. J. Mach. Learn. & Cyber. 13, 2153–2169 (2022). https://doi.org/10.1007/s13042-022-01512-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01512-y