Abstract
Visual attention plays a central role in natural and artificial systems to control perceptual resources. The classic artificial visual attention systems uses salient features of the image obtained from the information given by predefined filters. Recently, deep neural networks have been developed for recognizing thousands of objects and autonomously generate visual characteristics optimized by training with large data sets. Besides being used for object recognition, these features have been very successful in other visual problems such as object segmentation, tracking and recently, visual attention. In this work we propose a biologically inspired object classification and localization framework that combines Deep Convolutional Neural Networks with foveal vision. First, a feed-forward pass is performed to obtain the predicted class labels. Next, we get the object location proposals by applying a segmentation mask on the saliency map calculated through a top-down backward pass. The main contribution of our work lies in the evaluation of the performances obtained with different non-uniform resolutions. We were able to establish a relationship between performance and the different levels of information preserved by each of the sensing configurations. The results demonstrate that we do not need to store and transmit all the information present on high-resolution images since, beyond a certain amount of preserved information, the performance in the classification and localization task saturates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Source: http://image-net.org/challenges/LSVRC/2012/ [as seen on June, 2017].
References
Bastos, M.J.: Modeling human gaze patterns to improve visual search in autonomous systems. Master’s thesis, Instituto Superior Técnico (2016)
Borji, A., Itti, L.: State-of-the-art in visual attention modelling. IEEE Trans. Patt. Anal. Mach. Intell. 35(1), 185–207 (2013)
Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)
Cao, C., Liu, X., Yang, Y., Yu, Y., Wang, J., Wang, Z., Wang, L., Huang, C., Huang, T.S., Xu, W., Ramanan, D., Huang, Y.: Look and think twice: capturing top-down visual attention with Feedback. In: International Conference on Computer Vision (2015)
Geisler, W.S., Perry, J.S.: Real-time foveated multiresolution system for low-bandwidth video communication. In: Photonics West 1998 Electronic Imaging, International Society for Optics and Photonics, pp. 294–305 (1998)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Patt. Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances In Neural Information Processing Systems, pp. 1–9 (2012)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: Computer Vision and Pattern Recognition (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, pp. 1–14 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P.: Going deeper with convolutions. In: Computer Vision Foundation (2014)
Traver, V.J., Bernardino, A.: A review of log-polar imaging for visual perception in robotics. Robot. Autonom. Syst. 58(4), 378–398 (2010)
Wallace, R.S., Ong, P.W., Bederson, B.B., Schwartz, E.L.: Space variant image processing. Int. J. Comput. Vis. 13(1), 71–90 (1994)
Wang, Z.: Rate scalable foveated image and video communications [Ph.D. thesis] (2003)
Acknowledgment
This work has been partially supported by the Portuguese Foundation for Science and Technology (FCT) project [UID/EEA/50009/2013]. Rui Figueiredo is funded by FCT PhD grant PD/BD/105779/2014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Almeida, A.F., Figueiredo, R., Bernardino, A., Santos-Victor, J. (2018). Deep Networks for Human Visual Attention: A Hybrid Model Using Foveal Vision. In: Ollero, A., Sanfeliu, A., Montano, L., Lau, N., Cardeira, C. (eds) ROBOT 2017: Third Iberian Robotics Conference. ROBOT 2017. Advances in Intelligent Systems and Computing, vol 694. Springer, Cham. https://doi.org/10.1007/978-3-319-70836-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-70836-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70835-5
Online ISBN: 978-3-319-70836-2
eBook Packages: EngineeringEngineering (R0)