Deep Networks for Human Visual Attention: A Hybrid Model Using Foveal Vision

Almeida, Ana Filipa; Figueiredo, Rui; Bernardino, Alexandre; Santos-Victor, José

doi:10.1007/978-3-319-70836-2_10

Ana Filipa Almeida¹⁹,
Rui Figueiredo¹⁹,
Alexandre Bernardino¹⁹ &
…
José Santos-Victor¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 694))

Included in the following conference series:

Iberian Robotics conference

2617 Accesses
4 Citations

Abstract

Visual attention plays a central role in natural and artificial systems to control perceptual resources. The classic artificial visual attention systems uses salient features of the image obtained from the information given by predefined filters. Recently, deep neural networks have been developed for recognizing thousands of objects and autonomously generate visual characteristics optimized by training with large data sets. Besides being used for object recognition, these features have been very successful in other visual problems such as object segmentation, tracking and recently, visual attention. In this work we propose a biologically inspired object classification and localization framework that combines Deep Convolutional Neural Networks with foveal vision. First, a feed-forward pass is performed to obtain the predicted class labels. Next, we get the object location proposals by applying a segmentation mask on the saliency map calculated through a top-down backward pass. The main contribution of our work lies in the evaluation of the performances obtained with different non-uniform resolutions. We were able to establish a relationship between performance and the different levels of information preserved by each of the sensing configurations. The results demonstrate that we do not need to store and transmit all the information present on high-resolution images since, beyond a certain amount of preserved information, the performance in the classification and localization task saturates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Source: http://image-net.org/challenges/LSVRC/2012/ [as seen on June, 2017].

References

Bastos, M.J.: Modeling human gaze patterns to improve visual search in autonomous systems. Master’s thesis, Instituto Superior Técnico (2016)
Google Scholar
Borji, A., Itti, L.: State-of-the-art in visual attention modelling. IEEE Trans. Patt. Anal. Mach. Intell. 35(1), 185–207 (2013)
Article Google Scholar
Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)
Article Google Scholar
Cao, C., Liu, X., Yang, Y., Yu, Y., Wang, J., Wang, Z., Wang, L., Huang, C., Huang, T.S., Xu, W., Ramanan, D., Huang, Y.: Look and think twice: capturing top-down visual attention with Feedback. In: International Conference on Computer Vision (2015)
Google Scholar
Geisler, W.S., Perry, J.S.: Real-time foveated multiresolution system for low-bandwidth video communication. In: Photonics West 1998 Electronic Imaging, International Society for Optics and Photonics, pp. 294–305 (1998)
Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Patt. Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Article Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances In Neural Information Processing Systems, pp. 1–9 (2012)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: Computer Vision and Pattern Recognition (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, pp. 1–14 (2015)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P.: Going deeper with convolutions. In: Computer Vision Foundation (2014)
Google Scholar
Traver, V.J., Bernardino, A.: A review of log-polar imaging for visual perception in robotics. Robot. Autonom. Syst. 58(4), 378–398 (2010)
Article Google Scholar
Wallace, R.S., Ong, P.W., Bederson, B.B., Schwartz, E.L.: Space variant image processing. Int. J. Comput. Vis. 13(1), 71–90 (1994)
Article Google Scholar
Wang, Z.: Rate scalable foveated image and video communications [Ph.D. thesis] (2003)
Google Scholar

Download references

Acknowledgment

This work has been partially supported by the Portuguese Foundation for Science and Technology (FCT) project [UID/EEA/50009/2013]. Rui Figueiredo is funded by FCT PhD grant PD/BD/105779/2014.

Author information

Authors and Affiliations

Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
Ana Filipa Almeida, Rui Figueiredo, Alexandre Bernardino & José Santos-Victor

Authors

Ana Filipa Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Rui Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Bernardino
View author publications
You can also search for this author in PubMed Google Scholar
José Santos-Victor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Figueiredo .

Editor information

Editors and Affiliations

Escuela Técnica Superior de Ingeniería, Universidad de Sevilla, Sevilla, Spain
Anibal Ollero
Institut de Robòtica I Informàtica Industrial (CSIC-UPC), Universitat Politècnica de Catalunya, Barcelona, Spain
Alberto Sanfeliu
Departamento de Informática e Ingeniería de Sistemas, Escuela de Ingeniería y Arquitectura, Instituto de Investigación en Ingeniería de Aragón, Zaragoza, Spain
Luis Montano
Institute of Electronics and Telematics Engineering of Aveiro (IEETA), Universidade de Aveiro, Aveiro, Portugal
Nuno Lau
IDMEC, Instituto Superior Técnico de Lisboa, Universidade de Lisboa, Lisbon, Portugal
Carlos Cardeira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Almeida, A.F., Figueiredo, R., Bernardino, A., Santos-Victor, J. (2018). Deep Networks for Human Visual Attention: A Hybrid Model Using Foveal Vision. In: Ollero, A., Sanfeliu, A., Montano, L., Lau, N., Cardeira, C. (eds) ROBOT 2017: Third Iberian Robotics Conference. ROBOT 2017. Advances in Intelligent Systems and Computing, vol 694. Springer, Cham. https://doi.org/10.1007/978-3-319-70836-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-70836-2_10
Published: 21 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70835-5
Online ISBN: 978-3-319-70836-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics