Do Humans Look Where Deep Convolutional Neural Networks “Attend”?

Ebrahimpour, Mohammad K.; Falandays, J. Ben; Spevack, Samuel; Noelle, David C.

doi:10.1007/978-3-030-33723-0_5

Mohammad K. Ebrahimpour²⁰,
J. Ben Falandays²¹,
Samuel Spevack²¹ &
…
David C. Noelle^20,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11845))

Included in the following conference series:

International Symposium on Visual Computing

1769 Accesses
4 Citations
3 Altmetric

Abstract

Deep Convolutional Neural Networks (CNNs) have recently begun to exhibit human level performance on some visual perception tasks. Performance remains relatively poor, however, on some vision tasks, such as object detection: specifying the location and object class for all objects in a still image. We hypothesized that this gap in performance may be largely due to the fact that humans exhibit selective attention, while most object detection CNNs have no corresponding mechanism. In examining this question, we investigated some well-known attention mechanisms in the deep learning literature, identifying their weaknesses and leading us to propose a novel attention algorithm called the Densely Connected Attention Model. We then measured human spatial attention, in the form of eye tracking data, during the performance of an analogous object detection task. By comparing the learned representations produced by various CNN architectures with that exhibited by human viewers, we identified some relative strengths and weaknesses of the examined computational attention mechanisms. Some CNNs produced attentional patterns somewhat similar to those of humans. Others focused processing on objects in the foreground. Still other CNN attentional mechanisms produced usefully interpretable internal representations. The resulting comparisons provide insights into the relationship between CNN attention algorithms and the human visual system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Nets: What have They Ever Done for Vision?

Article 27 November 2020

Deep Networks for Human Visual Attention: A Hybrid Model Using Foveal Vision

Superstitious Perception: Comparing Perceptual Prediction by Humans and Neural Networks

References

Brainard, D.H., Vision, S.: The psychophysics toolbox. Spat. Vis. 10, 433–436 (1997)
Article Google Scholar
Ebrahimpour, M.K., et al.: Ventral-dorsal neural networks: object detection via selective attention. In: WACV (2019)
Google Scholar
Erhan, D., Bengio, Y., Courville, A., Vincent, P.: Visualizing higher-layer features of a deep network. Univ. Montreal 1341(3), 1 (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Nguyen, A., Yosinski, J., Clune, J.: Multifaceted feature visualization: uncovering the different types of features learned by each neuron in deep neural networks. arXiv preprint arXiv:1602.03616 (2016)
O’Reilly, R.C., Munakata, Y.: Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain. MIT Press, Cambridge (2000)
Book Google Scholar
Rajaei, K., Mohsenzadeh, Y., Ebrahimpour, R., Khaligh-Razavi, S.M.: Beyond core object recognition: Recurrent processes account for object recognition under occlusion, p. 302034 (2018). bioRxiv
Google Scholar
Rumelhart, D.E., McClelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. MIT Press, Cambridge (1986)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
Yamins, D.L.K., Hong, H., Cadieu, C.F., Solomon, E.A., Seibert, D., DiCarlo, J.J.: Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. 111(23), 8619–8624 (2014)
Article Google Scholar
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

EECS, University of California, Merced, USA
Mohammad K. Ebrahimpour & David C. Noelle
Cognitive and Information Sciences, University of California, Merced, USA
J. Ben Falandays, Samuel Spevack & David C. Noelle

Authors

Mohammad K. Ebrahimpour
View author publications
You can also search for this author in PubMed Google Scholar
J. Ben Falandays
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Spevack
View author publications
You can also search for this author in PubMed Google Scholar
David C. Noelle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad K. Ebrahimpour .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
NASA Ames Research Center, Moffett Field, CA, USA
Richard Boyle
University of Nevada, Reno, NV, USA
Bahram Parvin
Desert Research Institute, Reno, NV, USA
Darko Koracin
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Daniela Ushizima
Latent AI, Palo Alto, CA, USA
Sek Chai
Texas A&M University, College Station, TX, USA
Shinjiro Sueda
Louisiana State University, Baton Rouge, LA, USA
Xin Lin
University of North Carolina at Charlotte, Charlotte, NC, USA
Aidong Lu
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Daniel Thalmann
Notre Dame University, Notre Dame, IN, USA
Chaoli Wang
Bosch Research North America, Palo Alto, CA, USA
Panpan Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ebrahimpour, M.K., Falandays, J.B., Spevack, S., Noelle, D.C. (2019). Do Humans Look Where Deep Convolutional Neural Networks “Attend”?. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science(), vol 11845. Springer, Cham. https://doi.org/10.1007/978-3-030-33723-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-33723-0_5
Published: 21 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33722-3
Online ISBN: 978-3-030-33723-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics