Abstract
Convolutional neural networks (CNNs) have become a major driving force for visual saliency prediction. However, since the features of different layers have diverse characteristics, not all of them are effective for saliency detection, and some even cause interference. To effectively fuse multiscale features from different layers, in this paper, we propose a feature and Gaussian optimization network (FGO-Net) for saliency prediction. Specifically, we first design a novel attention ConvLSTM (ACL) module that contains circular soft layer attention to iteratively reweight multilevel features from different layers. Besides, previous works generally employed Gaussian blur with fixed kernel size after saliency maps generation as post-processing. To this end, we design an adaptive Gaussian blur (AGB) module that can automatically select the appropriate Gaussian kernel size to blur the saliency map without post-processing. Extensive experiments on several public saliency datasets demonstrate that the proposed FGO-Net achieves competitive results in terms of various evaluation metrics.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Borji A (2019) Saliency prediction in the deep learning era: Successes and limitations. IEEE Trans Patt Anal Mach Intell
Lennie P (2003) The cost of cortical computation. Curr Biol 13(6):493–497
Wang W, Shen J, Dong X, Borji A, Yang R (2019) Inferring salient objects from human fixations. IEEE Trans Patt Anal Mach Intell 42(8):1913–1927
Russakovsky O, Li L-J, Fei-Fei L (2015) Best of both worlds: human-machine collaboration for object annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2121–2131
Hadizadeh H, Bajić I V (2013) Saliency-aware video compression. IEEE Trans Image Process 23(1):19–33
Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans Multimed Comput Commun Appl 14(2):1–21
V M, N V (2009) Saliency-based discriminant tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1007–1013
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Patt Anal Mach Intell 20(11):1254–1259
Xie Y, Lu H, Yang M-H (2012) Bayesian saliency via low and mid level cues. IEEE Trans Image Process 22(5):1689–1698
Erdem E, Erdem A (2013) Visual saliency estimation by nonlinearly integrating features using region covariances. J Vis 13(4):11–11
Goferman S, Zelnik-Manor L, Tal A (2011) Context-aware saliency detection. IEEE Trans Patt Anal Mach Intell 34(10):1915– 1926
Liu N, Han J, Liu T, Li X (2016) Learning to predict eye fixations via multiresolution convolutional neural networks. IEEE Trans Neural Netw Learn Syst 29(2):392–404
Kruthiventi Srinivas SS, Gudisa V, Dholakiya J H, Venkatesh Babu R (2016) Saliency unified: A deep architecture for simultaneous eye fixation prediction and salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5781–5790
Liu N, Han J, Zhang D, Wen S, Liu T (2015) Predicting eye fixations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 362–370
Kruthiventi Srinivas SS, Ayush K, Babu R V (2017) Deepfix: A fully convolutional neural network for predicting human eye fixations. IEEE Trans Image Process 26(9):4446–4456
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9259–9266
Wan B, Zhou D, Liu Y, Li R, He X (2019) Pose-aware multi-level feature network for human object interaction detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9469–9478
Huang Z, Chen H-X, Zhou T, Yang Y-Z, Liu B-Y (2021) Multi-level cross-modal interaction network for rgb-d salient object detection. Neurocomputing 452:200–211
Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp 3–19
Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3085–3094
Wang W, Shen J (2017) Deep visual attention prediction. IEEE Trans Image Process 27 (5):2368–2378
Bruce N, Tsotsos J (2006) Saliency based on information maximization. In: Advances in Neural Information Processing Systems, pp 155–162
Liang M, Hu X (2015) Predicting eye fixations with higher-level visual features. IEEE Trans Image Process 24(3):1178–1189
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105
Che Z, Borji A, Zhai G, Ling S, Li J, Tian Y, Guo G, Le Callet P (2021) Adversarial attack against deep saliency models powered by non-redundant priors. IEEE Trans Image Process 30:1973–1988
Shen C, Song M, Zhao Q (2012) Learning high-level concepts by training a deep network on eye fixations. In: NIPS Deep Learning and Unsup Feat Learn Workshop, vol 2
Huang X, Shen C, Boix X, Zhao Q (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 262–270
Pan J, Sayrol E, Giro-i Nieto X, McGuinness K, O’Connor N E (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 598–606
Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans Image Process 27(10):5142–5154
Liu N, Han J (2018) A deep spatial contextual long-term recurrent convolutional network for saliency detection. IEEE Trans Image Process 27(7):3264–3274
Kummerer M, Wallis Thomas SA, Gatys L A, Bethge M (2017) Understanding low-and high-level contributions to fixation prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4789–4798
He S, Tavakoli H R, Borji A, Mi Y, Pugeault N (2020) Understanding and visualizing deep visual saliency models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10206–10215
Wang W, Lai Q, Fu H, Shen J, Ling H, Yang R (2021) Salient object detection in the deep learning era: An in-depth survey. IEEE Trans Patt Anal Mach Intell
Bi H-B, Lu D, Zhu H-H, Yang L-N, Guan H-P (2021) Sta-net: spatial-temporal attention network for video salient object detection. Appl Intell 51(6):3450–3459
Cheng M-M, Mitra N J, Huang X, Torr PHS, Hu S-M (2014) Global contrast based salient region detection. IEEE Trans Patt Anal Mach Intell 37(3):569–582
Pang Y, Zhao X, Zhang L, Lu H (2020) Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9413–9422
Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5455–5463
Hou Q, Cheng M-M, Hu X, Borji A, Tu Z, Torr PHS (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3203–3212
Zhang Q, Cong R, Li C, Cheng M-M, Fang Y, Cao X, Zhao Y, Kwong S (2020) Dense attention fluid network for salient object detection in optical remote sensing images. IEEE Trans Image Process
Chen S, Tan X, Wang B, Hu X (2018) Reverse attention for salient object detection. In: Proceedings of the European Conference on Computer Vision, pp 234–250
Feng M, Lu H, Ding E (2019) Attentive feedback network for boundary-aware salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1623–1632
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164
Zhang Z, Lin Z, Xu J, Jin W-D, Lu S-P, Fan D-P (2021) Bilateral attention network for rgb-d salient object detection. IEEE Trans Image Process 30:1949–1961
Xu K, Li D, Cassimatis N, Wang X (2018) Lcanet: End-to-end lipreading with cascaded attention-ctc. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition. IEEE, pp 548–555
Fan D-P, Ji G-P, Zhou T, Chen G, Fu H, Shen J, Shao L (2020) Pranet: Parallel reverse attention network for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 263–273
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp 2048–2057
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5659–5667
Lejbølle A R, Nasrollahi K, Krogh B, Moeslund T B (2019) Person re-identification using spatial and layer-wise attention. IEEE Trans Inf Forensic Secur 15:1216–1231
He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Rothenstein A L, Tsotsos J K (2008) Attention links sensing to recognition. Image Vis Comput 26(1):114–126
Han J, Zhang D, Wen S, Guo L, Liu T, Li X (2015) Two-stage learning to predict human eye fixations via sdaes. IEEE Trans Cybern 46(2):487–498
Kingma D P, Ba J (2015) Adam: A method for stochastic optimization. In: ICLR
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Ieee, pp 248–255
Tatler B W, Baddeley R J, Gilchrist I D (2005) Visual correlates of fixation selection: Effects of scale and time. Vis Res 45(5):643–659
Cornia M, Baraldi L, Serra G, Cucchiara R (2016) A deep multi-level network for saliency prediction. In: 2016 23rd International Conference on Pattern Recognition. IEEE, pp 3488–3493
Zhang J, Sclaroff S (2013) Saliency detection: A boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp 153–160
Vig E, Dorr M, Cox D (2014) Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2798–2805
Tavakoli H R, Borji A, Laaksonen J, Rahtu E (2017) Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features. Neurocomputing 244:10–18
Hou X, Zhang L (2009) Dynamic visual attention: Searching for coding length increments. In: Advances in Neural Information Processing Systems, pp 681–688
Jetley S, Murray N, Vig E (2016) End-to-end saliency mapping via probability distribution prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5753–5761
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. In: Advances in Neural Information Processing Systems, pp 545–552
Kroner A, Senden M, Driessens K, Goebel R (2020) Contextual encoder–decoder network for visual saliency prediction. Neural Netw 129:261–270
Qi F, Lin C, Shi G, Li H (2019) A convolutional encoder-decoder network with skip connections for saliency prediction. IEEE Access 7:60428–60438
Yang S, Lin G, Jiang Q, Lin W (2019) A dilated inception network for visual saliency prediction. IEEE Trans Multimed 22(8):2163–2176
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 61902139).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The first two authors contributed equally to this work.
Rights and permissions
About this article
Cite this article
Pei, J., Zhou, T., Tang, H. et al. FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction. Appl Intell 53, 6214–6229 (2023). https://doi.org/10.1007/s10489-022-03647-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03647-5