Visual saliency prediction using multi-scale attention gated network

Sun, Yubao; Zhao, Mengyang; Hu, Kai; Fan, Shaojing

doi:10.1007/s00530-021-00796-4

Visual saliency prediction using multi-scale attention gated network

Regular Paper
Published: 19 May 2021

Volume 28, pages 131–139, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Yubao Sun¹,
Mengyang Zhao¹,
Kai Hu¹ &
…
Shaojing Fan²

721 Accesses
6 Citations
Explore all metrics

Abstract

Predicting human visual attention cannot only increase our understanding of the underlying biological mechanisms, but also bring new insights for other computer vision-related tasks such as autonomous driving and human–computer interaction. Current deep learning-based methods often place emphasis on high-level semantic feature for prediction. However, high-level semantic feature lacks fine-scale spatial information. Ideally, a saliency prediction model should include both spatial and semantic features. In this paper, we propose a multi-scale attention gated network (we refer to as MSAGNet) to fuse semantic features with different spatial resolutions for visual saliency prediction. Specifically, we adopt the high-resolution net (HRNet) as the backbone to extract the multi-scales semantic features. A multi-scale attention gating module is designed to adaptively fuse these multi-scale features in a hierarchical way. Different from the conventional way of feature concatenation from multiple layers or multi-scale inputs, this module calculates a spatial attention map from high-level semantic feature and then fuses it with the low-level spatial feature through gating operation. Through the hierarchical gating fusion, final saliency prediction is achieved at the finest scale. Extensive experimental analyses on three benchmark datasets demonstrate the superior performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aggregated Deep Saliency Prediction by Self-attention Network

FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction

Article 07 July 2022

DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention

Article 28 August 2020

References

Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint. arXiv:1505.03581 (2015)
Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2018)
Article Google Scholar
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A deep multi-level network for saliency prediction. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3488–3493. IEEE (2016)
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)
Article MathSciNet Google Scholar
Fan, S., Shen, Z., Jiang, M., Koenig, B.L., Xu, J., Kankanhalli, M.S., Zhao, Q.: Emotional attention: a study of image sentiment and visual attention. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 7521–7531 (2018)
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Proceedings of the Conference on Neural Information processing Systems (NIPS), vol. 19, pp. 545–552 (2007)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, X., Shen, C., Boix, X., Zhao, Q.: Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 262–270 (2015)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Article Google Scholar
Jetley, S., Murray, N., Vig, E.: End-to-end saliency mapping via probability distribution prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5753–5761 (2016)
Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015)
Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. In: Matters of intelligence, pp. 115–141. Springer (1987)
Kruthiventi, S.S., Ayush, K., Babu, R.V.: Deepfix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans. Image Process. 26(9), 4446–4456 (2017)
Article MathSciNet Google Scholar
Kümmerer, M., Theis, L., Bethge, M.: Deep gaze i: boosting saliency prediction with feature maps trained on imagenet. arXiv preprint. arXiv:1411.1045 (2014)
Kümmerer, M., Wallis, T.S., Bethge, M.: Information-theoretic model comparison unifies saliency metrics. Proc. Natl. Acad. Sci. 112(52), 16054–16059 (2015)
Article Google Scholar
Le Meur, O., Baccino, T.: Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behav. Res. Methods 45(1), 251–266 (2013)
Article Google Scholar
Le Meur, O., Le Callet, P., Barba, D.: Predicting visual fixations on video based on low-level visual features. Vis. Res. 47(19), 2483–2498 (2007)
Article Google Scholar
Liu, N., Han, J., Liu, T., Li, X.: Learning to predict eye fixations via multiresolution convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 29(2), 392–404 (2016)
Article MathSciNet Google Scholar
Pan, J., Ferrer, C.C., McGuinness, K., O’Connor, N.E., Torres, J., Sayrol, E., Giro-i Nieto, X.: Salgan: Visual saliency prediction with generative adversarial networks. arXiv preprint. arXiv:1701.01081 (2017)
Pan, J., Sayrol, E., Giro-i Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–606 (2016)
Peters, R.J., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vis. Res. 45(18), 2397–2416 (2005)
Article Google Scholar
Ramanathan, S., Katti, H., Sebe, N., Kankanhalli, M., Chua, T.S.: An eye fixation database for saliency detection in images. In: European Conference on Computer Vision, pp. 30–43. Springer (2010)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Swain, M.J., Ballard, D.H.: Color indexing. Int. J. Comput. Vis. 7(1), 11–32 (1991)
Article Google Scholar
Tang, H., Chen, C., Pei, X.: Visual saliency detection via sparse residual and outlier detection. IEEE Signal Process. Lett. 23(12), 1736–1740 (2016)
Article Google Scholar
Tatler, B.W., Baddeley, R.J., Gilchrist, I.D.: Visual correlates of fixation selection: effects of scale and time. Vis. Res. 45(5), 643–659 (2005)
Article Google Scholar
Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cogn. Psychol. 12(1), 97–136 (1980)
Article Google Scholar
Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2798–2805 (2014)
Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2017)
Article MathSciNet Google Scholar
Wilming, N., Betz, T., Kietzmann, T.C., König, P.: Measures and limits of models of fixation selection. PLoS One 6(9), e24038 (2011)
Article Google Scholar
Wolfe, J.M., Cave, K.R., Franzel, S.L.: Guided search: an alternative to the feature integration model for visual search. J. Exp. Psychol. Hum. Percept. Perform. 15(3), 419 (1989)
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Zhang, J., Sclaroff, S.: Saliency detection: a boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–160 (2013)

Download references

Acknowledgements

This work is partially supported by National Natural Science Foundation of China under Grant (U2001211, 61672292, 61532009), and partially supported by Basic Research Program of Jiangsu Province (Frontier Leading Technology) under Grant BK20192004.

Author information

Authors and Affiliations

The Jiangsu Key Laboratory of Big Data Analysis Technology (B-DAT Laboratory), Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing, 210044, China
Yubao Sun, Mengyang Zhao & Kai Hu
The School of Computing, National University of Singapore, Singapore, Singapore
Shaojing Fan

Authors

Yubao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Mengyang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Kai Hu
View author publications
You can also search for this author in PubMed Google Scholar
Shaojing Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Hu.

Additional information

Communicated by B.-K. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Y., Zhao, M., Hu, K. et al. Visual saliency prediction using multi-scale attention gated network. Multimedia Systems 28, 131–139 (2022). https://doi.org/10.1007/s00530-021-00796-4

Download citation

Received: 25 October 2020
Accepted: 10 April 2021
Published: 19 May 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00530-021-00796-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual saliency prediction using multi-scale attention gated network

Abstract

Access this article

Similar content being viewed by others

Aggregated Deep Saliency Prediction by Self-attention Network

FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction

DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual saliency prediction using multi-scale attention gated network

Abstract

Access this article

Similar content being viewed by others

Aggregated Deep Saliency Prediction by Self-attention Network

FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction

DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation