Skip to main content
Log in

Visual saliency prediction using multi-scale attention gated network

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Predicting human visual attention cannot only increase our understanding of the underlying biological mechanisms, but also bring new insights for other computer vision-related tasks such as autonomous driving and human–computer interaction. Current deep learning-based methods often place emphasis on high-level semantic feature for prediction. However, high-level semantic feature lacks fine-scale spatial information. Ideally, a saliency prediction model should include both spatial and semantic features. In this paper, we propose a multi-scale attention gated network (we refer to as MSAGNet) to fuse semantic features with different spatial resolutions for visual saliency prediction. Specifically, we adopt the high-resolution net (HRNet) as the backbone to extract the multi-scales semantic features. A multi-scale attention gating module is designed to adaptively fuse these multi-scale features in a hierarchical way. Different from the conventional way of feature concatenation from multiple layers or multi-scale inputs, this module calculates a spatial attention map from high-level semantic feature and then fuses it with the low-level spatial feature through gating operation. Through the hierarchical gating fusion, final saliency prediction is achieved at the finest scale. Extensive experimental analyses on three benchmark datasets demonstrate the superior performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint. arXiv:1505.03581 (2015)

  2. Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2018)

    Article  Google Scholar 

  3. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A deep multi-level network for saliency prediction. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3488–3493. IEEE (2016)

  4. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)

    Article  MathSciNet  Google Scholar 

  5. Fan, S., Shen, Z., Jiang, M., Koenig, B.L., Xu, J., Kankanhalli, M.S., Zhao, Q.: Emotional attention: a study of image sentiment and visual attention. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 7521–7531 (2018)

  6. Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Proceedings of the Conference on Neural Information processing Systems (NIPS), vol. 19, pp. 545–552 (2007)

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  8. Huang, X., Shen, C., Boix, X., Zhao, Q.: Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 262–270 (2015)

  9. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)

    Article  Google Scholar 

  10. Jetley, S., Murray, N., Vig, E.: End-to-end saliency mapping via probability distribution prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5753–5761 (2016)

  11. Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015)

  12. Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. In: Matters of intelligence, pp. 115–141. Springer (1987)

  13. Kruthiventi, S.S., Ayush, K., Babu, R.V.: Deepfix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans. Image Process. 26(9), 4446–4456 (2017)

    Article  MathSciNet  Google Scholar 

  14. Kümmerer, M., Theis, L., Bethge, M.: Deep gaze i: boosting saliency prediction with feature maps trained on imagenet. arXiv preprint. arXiv:1411.1045 (2014)

  15. Kümmerer, M., Wallis, T.S., Bethge, M.: Information-theoretic model comparison unifies saliency metrics. Proc. Natl. Acad. Sci. 112(52), 16054–16059 (2015)

    Article  Google Scholar 

  16. Le Meur, O., Baccino, T.: Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behav. Res. Methods 45(1), 251–266 (2013)

    Article  Google Scholar 

  17. Le Meur, O., Le Callet, P., Barba, D.: Predicting visual fixations on video based on low-level visual features. Vis. Res. 47(19), 2483–2498 (2007)

    Article  Google Scholar 

  18. Liu, N., Han, J., Liu, T., Li, X.: Learning to predict eye fixations via multiresolution convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 29(2), 392–404 (2016)

    Article  MathSciNet  Google Scholar 

  19. Pan, J., Ferrer, C.C., McGuinness, K., O’Connor, N.E., Torres, J., Sayrol, E., Giro-i Nieto, X.: Salgan: Visual saliency prediction with generative adversarial networks. arXiv preprint. arXiv:1701.01081 (2017)

  20. Pan, J., Sayrol, E., Giro-i Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–606 (2016)

  21. Peters, R.J., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vis. Res. 45(18), 2397–2416 (2005)

    Article  Google Scholar 

  22. Ramanathan, S., Katti, H., Sebe, N., Kankanhalli, M., Chua, T.S.: An eye fixation database for saliency detection in images. In: European Conference on Computer Vision, pp. 30–43. Springer (2010)

  23. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

  24. Swain, M.J., Ballard, D.H.: Color indexing. Int. J. Comput. Vis. 7(1), 11–32 (1991)

    Article  Google Scholar 

  25. Tang, H., Chen, C., Pei, X.: Visual saliency detection via sparse residual and outlier detection. IEEE Signal Process. Lett. 23(12), 1736–1740 (2016)

    Article  Google Scholar 

  26. Tatler, B.W., Baddeley, R.J., Gilchrist, I.D.: Visual correlates of fixation selection: effects of scale and time. Vis. Res. 45(5), 643–659 (2005)

    Article  Google Scholar 

  27. Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cogn. Psychol. 12(1), 97–136 (1980)

    Article  Google Scholar 

  28. Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2798–2805 (2014)

  29. Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2017)

    Article  MathSciNet  Google Scholar 

  30. Wilming, N., Betz, T., Kietzmann, T.C., König, P.: Measures and limits of models of fixation selection. PLoS One 6(9), e24038 (2011)

    Article  Google Scholar 

  31. Wolfe, J.M., Cave, K.R., Franzel, S.L.: Guided search: an alternative to the feature integration model for visual search. J. Exp. Psychol. Hum. Percept. Perform. 15(3), 419 (1989)

    Article  Google Scholar 

  32. Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  33. Zhang, J., Sclaroff, S.: Saliency detection: a boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–160 (2013)

Download references

Acknowledgements

This work is partially supported by National Natural Science Foundation of China under Grant (U2001211, 61672292, 61532009), and partially supported by Basic Research Program of Jiangsu Province (Frontier Leading Technology) under Grant BK20192004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Hu.

Additional information

Communicated by B.-K. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Zhao, M., Hu, K. et al. Visual saliency prediction using multi-scale attention gated network. Multimedia Systems 28, 131–139 (2022). https://doi.org/10.1007/s00530-021-00796-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00796-4

Keywords

Navigation