Skip to main content

Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era

  • Conference paper
  • First Online:
AI*IA 2017 Advances in Artificial Intelligence (AI*IA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10640))

Included in the following conference series:

  • 1548 Accesses

Abstract

Estimating the focus of attention of a person looking at an image or a video is a crucial step which can enhance many vision-based inference mechanisms: image segmentation and annotation, video captioning, autonomous driving are some examples. The early stages of the attentive behavior are typically bottom-up; reproducing the same mechanism means to find the saliency embodied in the images, i.e. which parts of an image pop out of a visual scene. This process has been studied for decades in neuroscience and in terms of computational models for reproducing the human cortical process. In the last few years, early models have been replaced by deep learning architectures, that outperform any early approach compared against public datasets. In this paper, we propose a discussion on why convolutional neural networks (CNNs) are so accurate in saliency prediction. We present our DL architectures which combine both bottom-up cues and higher-level semantics, and incorporate the concept of time in the attentional process through LSTM recurrent architectures. Eventually, we present a video-specific architecture based on the C3D network, which can extracts spatio-temporal features by means of 3D convolutions to model task-driven attentive behaviors. The merit of this work is to show how these deep networks are not mere brute-force methods tuned on massive amount of data, but represent well-defined architectures which recall very closely the early saliency models, although improved with the semantics learned by human ground-truth.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Numerical and qualitative results of the Itti-Koch model have been generated using the re-implementation of [14], which is also the one reported in the MIT Saliency Benchmark [6].

  2. 2.

    Attentive subsequences in DR(eye)VE are clips in which the driver is looking far from the image center due to a peculiar maneuver he is performing. We refer the reader to [26] for details.

References

  1. Alletto, S., Palazzi, A., Solera, F., Calderara, S., Cucchiara, R.: DR(eye)VE: a dataset for attention-based tasks with applications to autonomous and assisted driving. In: CVPR Workshops (2016)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  3. Baraldi, L., Grana, C., Cucchiara, R.: Hierarchical boundary-aware neural encoder for video captioning. In: CVPR (2017)

    Google Scholar 

  4. Bazzani, L., Larochelle, H., Torresani, L.: Recurrent mixture density network for spatiotemporal visual attention. In: ICLR (2017)

    Google Scholar 

  5. Bruce, N., Tsotsos, J.: Saliency based on information maximization. In: ANIPS, pp. 155–162 (2005)

    Google Scholar 

  6. Bylinskii, Z., Judd, T., Borji, A., Itti, L., Durand, F., Oliva, A., Torralba, A.: Mit saliency benchmark. http://saliency.mit.edu/

  7. Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? arXiv preprint arXiv:1604.03605 (2016)

  8. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A deep multi-level network for saliency prediction. In: ICPR (2016)

    Google Scholar 

  9. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Multi-level net: a visual saliency prediction model. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 302–315. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_21

    Chapter  Google Scholar 

  10. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an LSTM-based saliency attentive model. arXiv preprint arXiv:1611.09571 (2017)

  11. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Visual saliency for image captioning in new multimedia services. In: ICME Workshops (2017)

    Google Scholar 

  12. Greenspan, H., Belongie, S., Goodman, R., Perona, P., Rakshit, S., Anderson, C.H.: Overcomplete steerable pyramid filters and rotation invariance. In: CVPR (1994)

    Google Scholar 

  13. Hadizadeh, H., Baji, I.V.: Saliency-aware video compression. IEEE Trans. Image Process. 23(1), 19–33 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: ANIPS, pp. 545–552 (2006)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  16. Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: ICCV (2015)

    Google Scholar 

  17. Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)

    Article  Google Scholar 

  18. Itti, L., Koch, C., Niebur, E., et al.: A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20(11), 1254–1259 (1998)

    Article  Google Scholar 

  19. Jetley, S., Murray, N., Vig, E.: End-to-end saliency mapping via probability distribution prediction. In: CVPR (2016)

    Google Scholar 

  20. Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: CVPR (2015)

    Google Scholar 

  21. Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations. MIT Technical report (2012)

    Google Scholar 

  22. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV (2009)

    Google Scholar 

  23. Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. In: Vaina, L.M. (ed.) Matters of Intelligence, pp. 115–141. Springer, Dordrecht (1987). https://doi.org/10.1007/978-94-009-3833-5_5

    Chapter  Google Scholar 

  24. Kruthiventi, S.S., Ayush, K., Babu, R.V.: Deepfix: a fully convolutional neural network for predicting human eye fixations. arXiv preprint arXiv:1510.02927 (2015)

  25. Kümmerer, M., Theis, L., Bethge, M.: DeepGaze I: Boosting saliency prediction with feature maps trained on ImageNet. In: ICLR Workshops (2015)

    Google Scholar 

  26. Palazzi, A., Solera, F., Calderara, S., Alletto, S., Cucchiara, R.: Learning to attend like a human driver. In: Intelligent Vehicles Symposium (2017)

    Google Scholar 

  27. Pan, J., Sayrol, E., Giro-i-Nieto, X., McGuinness, K., Giró-i, N.X.: Shallow and deep convolutional networks for saliency prediction. In: CVPR (2016)

    Google Scholar 

  28. Rudoy, D., Goldman, D.B., Shechtman, E., Zelnik-Manor, L.: Learning video saliency from human gaze using candidate selection. In: CVPR (2013)

    Google Scholar 

  29. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)

    Google Scholar 

  30. Mathe, S., Sminchisescu, C.: Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition. IEEE TPAMI 37(7), 1408–1424 (2015)

    Article  Google Scholar 

  31. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)

    Google Scholar 

  32. Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cogn. Psychol. 12(1), 97–136 (1980)

    Article  Google Scholar 

  33. Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: CVPR (2014)

    Google Scholar 

  34. Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: CVPR (2015)

    Google Scholar 

  35. Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)

    Article  MathSciNet  Google Scholar 

  36. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Google Scholar 

  37. Zhai, Y., Shah, M.: Visual attention detection in video sequences using spatiotemporal cues. In: ACM MM (2006)

    Google Scholar 

  38. Zhang, J., Sclaroff, S.: Saliency detection: a boolean map approach. In: ICCV (2013)

    Google Scholar 

  39. Zhong, S.H., Liu, Y., Ren, F., Zhang, J., Ren, T.: Video saliency detection via dynamic consistent spatio-temporal attention modelling. In: AAAI (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcella Cornia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cornia, M., Abati, D., Baraldi, L., Palazzi, A., Calderara, S., Cucchiara, R. (2017). Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era. In: Esposito, F., Basili, R., Ferilli, S., Lisi, F. (eds) AI*IA 2017 Advances in Artificial Intelligence. AI*IA 2017. Lecture Notes in Computer Science(), vol 10640. Springer, Cham. https://doi.org/10.1007/978-3-319-70169-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70169-1_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70168-4

  • Online ISBN: 978-3-319-70169-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics