Skip to main content
Log in

FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have become a major driving force for visual saliency prediction. However, since the features of different layers have diverse characteristics, not all of them are effective for saliency detection, and some even cause interference. To effectively fuse multiscale features from different layers, in this paper, we propose a feature and Gaussian optimization network (FGO-Net) for saliency prediction. Specifically, we first design a novel attention ConvLSTM (ACL) module that contains circular soft layer attention to iteratively reweight multilevel features from different layers. Besides, previous works generally employed Gaussian blur with fixed kernel size after saliency maps generation as post-processing. To this end, we design an adaptive Gaussian blur (AGB) module that can automatically select the appropriate Gaussian kernel size to blur the saliency map without post-processing. Extensive experiments on several public saliency datasets demonstrate that the proposed FGO-Net achieves competitive results in terms of various evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Borji A (2019) Saliency prediction in the deep learning era: Successes and limitations. IEEE Trans Patt Anal Mach Intell

  2. Lennie P (2003) The cost of cortical computation. Curr Biol 13(6):493–497

    Article  Google Scholar 

  3. Wang W, Shen J, Dong X, Borji A, Yang R (2019) Inferring salient objects from human fixations. IEEE Trans Patt Anal Mach Intell 42(8):1913–1927

    Article  Google Scholar 

  4. Russakovsky O, Li L-J, Fei-Fei L (2015) Best of both worlds: human-machine collaboration for object annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2121–2131

  5. Hadizadeh H, Bajić I V (2013) Saliency-aware video compression. IEEE Trans Image Process 23(1):19–33

    Article  MathSciNet  MATH  Google Scholar 

  6. Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans Multimed Comput Commun Appl 14(2):1–21

    Article  Google Scholar 

  7. V M, N V (2009) Saliency-based discriminant tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1007–1013

  8. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Patt Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  9. Xie Y, Lu H, Yang M-H (2012) Bayesian saliency via low and mid level cues. IEEE Trans Image Process 22(5):1689–1698

    MathSciNet  MATH  Google Scholar 

  10. Erdem E, Erdem A (2013) Visual saliency estimation by nonlinearly integrating features using region covariances. J Vis 13(4):11–11

    Article  Google Scholar 

  11. Goferman S, Zelnik-Manor L, Tal A (2011) Context-aware saliency detection. IEEE Trans Patt Anal Mach Intell 34(10):1915– 1926

    Article  Google Scholar 

  12. Liu N, Han J, Liu T, Li X (2016) Learning to predict eye fixations via multiresolution convolutional neural networks. IEEE Trans Neural Netw Learn Syst 29(2):392–404

    Article  MathSciNet  Google Scholar 

  13. Kruthiventi Srinivas SS, Gudisa V, Dholakiya J H, Venkatesh Babu R (2016) Saliency unified: A deep architecture for simultaneous eye fixation prediction and salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5781–5790

  14. Liu N, Han J, Zhang D, Wen S, Liu T (2015) Predicting eye fixations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 362–370

  15. Kruthiventi Srinivas SS, Ayush K, Babu R V (2017) Deepfix: A fully convolutional neural network for predicting human eye fixations. IEEE Trans Image Process 26(9):4446–4456

    Article  MathSciNet  MATH  Google Scholar 

  16. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9259–9266

  17. Wan B, Zhou D, Liu Y, Li R, He X (2019) Pose-aware multi-level feature network for human object interaction detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9469–9478

  18. Huang Z, Chen H-X, Zhou T, Yang Y-Z, Liu B-Y (2021) Multi-level cross-modal interaction network for rgb-d salient object detection. Neurocomputing 452:200–211

    Article  Google Scholar 

  19. Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp 3–19

  20. Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3085–3094

  21. Wang W, Shen J (2017) Deep visual attention prediction. IEEE Trans Image Process 27 (5):2368–2378

    Article  MathSciNet  Google Scholar 

  22. Bruce N, Tsotsos J (2006) Saliency based on information maximization. In: Advances in Neural Information Processing Systems, pp 155–162

  23. Liang M, Hu X (2015) Predicting eye fixations with higher-level visual features. IEEE Trans Image Process 24(3):1178–1189

    Article  MathSciNet  MATH  Google Scholar 

  24. Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105

  25. Che Z, Borji A, Zhai G, Ling S, Li J, Tian Y, Guo G, Le Callet P (2021) Adversarial attack against deep saliency models powered by non-redundant priors. IEEE Trans Image Process 30:1973–1988

    Article  Google Scholar 

  26. Shen C, Song M, Zhao Q (2012) Learning high-level concepts by training a deep network on eye fixations. In: NIPS Deep Learning and Unsup Feat Learn Workshop, vol 2

  27. Huang X, Shen C, Boix X, Zhao Q (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 262–270

  28. Pan J, Sayrol E, Giro-i Nieto X, McGuinness K, O’Connor N E (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 598–606

  29. Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans Image Process 27(10):5142–5154

    Article  MathSciNet  Google Scholar 

  30. Liu N, Han J (2018) A deep spatial contextual long-term recurrent convolutional network for saliency detection. IEEE Trans Image Process 27(7):3264–3274

    Article  MathSciNet  Google Scholar 

  31. Kummerer M, Wallis Thomas SA, Gatys L A, Bethge M (2017) Understanding low-and high-level contributions to fixation prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4789–4798

  32. He S, Tavakoli H R, Borji A, Mi Y, Pugeault N (2020) Understanding and visualizing deep visual saliency models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10206–10215

  33. Wang W, Lai Q, Fu H, Shen J, Ling H, Yang R (2021) Salient object detection in the deep learning era: An in-depth survey. IEEE Trans Patt Anal Mach Intell

  34. Bi H-B, Lu D, Zhu H-H, Yang L-N, Guan H-P (2021) Sta-net: spatial-temporal attention network for video salient object detection. Appl Intell 51(6):3450–3459

    Article  Google Scholar 

  35. Cheng M-M, Mitra N J, Huang X, Torr PHS, Hu S-M (2014) Global contrast based salient region detection. IEEE Trans Patt Anal Mach Intell 37(3):569–582

    Article  Google Scholar 

  36. Pang Y, Zhao X, Zhang L, Lu H (2020) Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9413–9422

  37. Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5455–5463

  38. Hou Q, Cheng M-M, Hu X, Borji A, Tu Z, Torr PHS (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3203–3212

  39. Zhang Q, Cong R, Li C, Cheng M-M, Fang Y, Cao X, Zhao Y, Kwong S (2020) Dense attention fluid network for salient object detection in optical remote sensing images. IEEE Trans Image Process

  40. Chen S, Tan X, Wang B, Hu X (2018) Reverse attention for salient object detection. In: Proceedings of the European Conference on Computer Vision, pp 234–250

  41. Feng M, Lu H, Ding E (2019) Attentive feedback network for boundary-aware salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1623–1632

  42. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008

  43. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  44. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164

  45. Zhang Z, Lin Z, Xu J, Jin W-D, Lu S-P, Fan D-P (2021) Bilateral attention network for rgb-d salient object detection. IEEE Trans Image Process 30:1949–1961

    Article  Google Scholar 

  46. Xu K, Li D, Cassimatis N, Wang X (2018) Lcanet: End-to-end lipreading with cascaded attention-ctc. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition. IEEE, pp 548–555

  47. Fan D-P, Ji G-P, Zhou T, Chen G, Fu H, Shen J, Shao L (2020) Pranet: Parallel reverse attention network for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 263–273

  48. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp 2048–2057

  49. Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5659–5667

  50. Lejbølle A R, Nasrollahi K, Krogh B, Moeslund T B (2019) Person re-identification using spatial and layer-wise attention. IEEE Trans Inf Forensic Secur 15:1216–1231

    Article  Google Scholar 

  51. He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  52. Rothenstein A L, Tsotsos J K (2008) Attention links sensing to recognition. Image Vis Comput 26(1):114–126

    Article  Google Scholar 

  53. Han J, Zhang D, Wen S, Guo L, Liu T, Li X (2015) Two-stage learning to predict human eye fixations via sdaes. IEEE Trans Cybern 46(2):487–498

    Article  Google Scholar 

  54. Kingma D P, Ba J (2015) Adam: A method for stochastic optimization. In: ICLR

  55. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Ieee, pp 248–255

  56. Tatler B W, Baddeley R J, Gilchrist I D (2005) Visual correlates of fixation selection: Effects of scale and time. Vis Res 45(5):643–659

    Article  Google Scholar 

  57. Cornia M, Baraldi L, Serra G, Cucchiara R (2016) A deep multi-level network for saliency prediction. In: 2016 23rd International Conference on Pattern Recognition. IEEE, pp 3488–3493

  58. Zhang J, Sclaroff S (2013) Saliency detection: A boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp 153–160

  59. Vig E, Dorr M, Cox D (2014) Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2798–2805

  60. Tavakoli H R, Borji A, Laaksonen J, Rahtu E (2017) Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features. Neurocomputing 244:10–18

    Article  Google Scholar 

  61. Hou X, Zhang L (2009) Dynamic visual attention: Searching for coding length increments. In: Advances in Neural Information Processing Systems, pp 681–688

  62. Jetley S, Murray N, Vig E (2016) End-to-end saliency mapping via probability distribution prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5753–5761

  63. Harel J, Koch C, Perona P (2006) Graph-based visual saliency. In: Advances in Neural Information Processing Systems, pp 545–552

  64. Kroner A, Senden M, Driessens K, Goebel R (2020) Contextual encoder–decoder network for visual saliency prediction. Neural Netw 129:261–270

    Article  Google Scholar 

  65. Qi F, Lin C, Shi G, Li H (2019) A convolutional encoder-decoder network with skip connections for saliency prediction. IEEE Access 7:60428–60438

    Article  Google Scholar 

  66. Yang S, Lin G, Jiang Q, Lin W (2019) A dilated inception network for visual saliency prediction. IEEE Trans Multimed 22(8):2163–2176

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 61902139).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to He Tang.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The first two authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pei, J., Zhou, T., Tang, H. et al. FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction. Appl Intell 53, 6214–6229 (2023). https://doi.org/10.1007/s10489-022-03647-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03647-5

Keywords

Navigation