Skip to main content

Deep Networks for Human Visual Attention: A Hybrid Model Using Foveal Vision

  • Conference paper
  • First Online:
ROBOT 2017: Third Iberian Robotics Conference (ROBOT 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 694))

Included in the following conference series:

Abstract

Visual attention plays a central role in natural and artificial systems to control perceptual resources. The classic artificial visual attention systems uses salient features of the image obtained from the information given by predefined filters. Recently, deep neural networks have been developed for recognizing thousands of objects and autonomously generate visual characteristics optimized by training with large data sets. Besides being used for object recognition, these features have been very successful in other visual problems such as object segmentation, tracking and recently, visual attention. In this work we propose a biologically inspired object classification and localization framework that combines Deep Convolutional Neural Networks with foveal vision. First, a feed-forward pass is performed to obtain the predicted class labels. Next, we get the object location proposals by applying a segmentation mask on the saliency map calculated through a top-down backward pass. The main contribution of our work lies in the evaluation of the performances obtained with different non-uniform resolutions. We were able to establish a relationship between performance and the different levels of information preserved by each of the sensing configurations. The results demonstrate that we do not need to store and transmit all the information present on high-resolution images since, beyond a certain amount of preserved information, the performance in the classification and localization task saturates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Source: http://image-net.org/challenges/LSVRC/2012/ [as seen on June, 2017].

References

  1. Bastos, M.J.: Modeling human gaze patterns to improve visual search in autonomous systems. Master’s thesis, Instituto Superior Técnico (2016)

    Google Scholar 

  2. Borji, A., Itti, L.: State-of-the-art in visual attention modelling. IEEE Trans. Patt. Anal. Mach. Intell. 35(1), 185–207 (2013)

    Article  Google Scholar 

  3. Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)

    Article  Google Scholar 

  4. Cao, C., Liu, X., Yang, Y., Yu, Y., Wang, J., Wang, Z., Wang, L., Huang, C., Huang, T.S., Xu, W., Ramanan, D., Huang, Y.: Look and think twice: capturing top-down visual attention with Feedback. In: International Conference on Computer Vision (2015)

    Google Scholar 

  5. Geisler, W.S., Perry, J.S.: Real-time foveated multiresolution system for low-bandwidth video communication. In: Photonics West 1998 Electronic Imaging, International Society for Optics and Photonics, pp. 294–305 (1998)

    Google Scholar 

  6. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Patt. Anal. Mach. Intell. 20(11), 1254–1259 (1998)

    Article  Google Scholar 

  7. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances In Neural Information Processing Systems, pp. 1–9 (2012)

    Google Scholar 

  9. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  10. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  11. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  12. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, pp. 1–14 (2015)

    Google Scholar 

  13. Szegedy, C., Liu, W., Jia, Y., Sermanet, P.: Going deeper with convolutions. In: Computer Vision Foundation (2014)

    Google Scholar 

  14. Traver, V.J., Bernardino, A.: A review of log-polar imaging for visual perception in robotics. Robot. Autonom. Syst. 58(4), 378–398 (2010)

    Article  Google Scholar 

  15. Wallace, R.S., Ong, P.W., Bederson, B.B., Schwartz, E.L.: Space variant image processing. Int. J. Comput. Vis. 13(1), 71–90 (1994)

    Article  Google Scholar 

  16. Wang, Z.: Rate scalable foveated image and video communications [Ph.D. thesis] (2003)

    Google Scholar 

Download references

Acknowledgment

This work has been partially supported by the Portuguese Foundation for Science and Technology (FCT) project [UID/EEA/50009/2013]. Rui Figueiredo is funded by FCT PhD grant PD/BD/105779/2014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Figueiredo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Almeida, A.F., Figueiredo, R., Bernardino, A., Santos-Victor, J. (2018). Deep Networks for Human Visual Attention: A Hybrid Model Using Foveal Vision. In: Ollero, A., Sanfeliu, A., Montano, L., Lau, N., Cardeira, C. (eds) ROBOT 2017: Third Iberian Robotics Conference. ROBOT 2017. Advances in Intelligent Systems and Computing, vol 694. Springer, Cham. https://doi.org/10.1007/978-3-319-70836-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70836-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70835-5

  • Online ISBN: 978-3-319-70836-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics