Capsule Networks for Attention Under Occlusion

Rodríguez-Sánchez, Antonio; Dick, Tobias

doi:10.1007/978-3-030-30493-5_50

Capsule Networks for Attention Under Occlusion

Conference paper
First Online: 09 September 2019

5299 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11731))

Abstract

Capsule Neural Networks (CapsNet) serve as an attempt to model the neural organization in biological neural networks. Through the routing-by-agreement algorithm, the attention mechanism is implemented as individual capsules that focus on specific upstream capsules while ignoring the rest. By using the routing algorithm, CapsNets are able to attend overlapping digits from the MNIST dataset. In this work, we evaluate the attention capabilities of Capsule Networks using the routing-by-agreement with occluded shape stimuli as presented in neurophysiology. We do so by implementing a more compact type of capsule network. Our results in classifying the non-occluded as well as the occluded shapes show that indeed, CapsNets are able to differentiate occlusions from near-occlusion situations as in real biological neurons. In our experiments, performing the reconstruction of the occluded stimuli also shows promising results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the Cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962)
Article Google Scholar
Pasupathy, A., Connor, C.E.: Shape representation in area V4: position-specific tuning for boundary conformation. J. Neurophysiol. 86(5), 2505–2519 (2001)
Article Google Scholar
Fukushima, K., Wake, N.: Handwritten alphanumeric character recognition by the neocognitron. IEEE Trans. Neural Netw. 2(3), 355–365 (1991)
Article Google Scholar
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nat. Neurosci. 2(11), 1019 (1999)
Article Google Scholar
Rodríguez-Sánchez, A., Tsotsos, J.: The roles of endstopped and curvature tuned computations in a hierarchical representation of 2D shape. PLoS ONE 7(8), e42058 (2012)
Article Google Scholar
Rodríguez-Sánchez, A., Oberleiter, S., Xiong, H., Piater, J.: Learning V4 curvature cell populations from sparse endstopped cells. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 463–471. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44781-0_55
Chapter Google Scholar
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 3, 411–426 (2007)
Article Google Scholar
Rodríguez-Sánchez, A.J., Tsotsos, J.K.: The importance of intermediate representations for the modeling of 2D shape detection: endstopping and curvature tuned computations. In: CVPR 2011, June 2011, pp. 4321–4326 (2011)
Google Scholar
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Stabinger, S., Rodríguez-Sánchez, A.: Evaluation of deep learning on an abstract image classification dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2767–2772 (2017). Workshop on Mutual Benefits of Cognitive and Computer Vision (MBCC)
Google Scholar
Kim, J., Ricci, M., Serre, T.: Not-so-CLEVR: visual relations strain feedforward neural networks (2018)
Google Scholar
Stabinger, S., Rodríguez-Sánchez, A., Piater, J.: 25 years of CNNs: can we compare to human abstraction capabilities? In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 380–387. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44781-0_45
Chapter Google Scholar
Kim, B., Reif, E., Wattenberg, M., Bengio, S.: Do neural networks show Gestalt phenomena? An exploration of the law of closure. arXiv preprint arXiv:1903.01069 (2019)
Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 44–51. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_6
Chapter Google Scholar
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3859–3869 (2017)
Google Scholar
LeCun, Y.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. Accessed 05 Mar 2019
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014)
Bushnell, B.N., Harding, P.J., Kosai, Y., Pasupathy, A.: Partial occlusion modulates contour-based shape encoding in primate area V4. J. Neurosci. 31(11), 4012–4024 (2011)
Article Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, no. 1, p. 3 (2013)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX OSDI 2016, pp. 265–283 (2016)
Google Scholar
Tensorflow Contributors: tf.train.exponential\(\_\)decay. https://www.tensorflow.org/api_docs/python/tf/train/exponential_decay. Accessed 05 Mar 2019
Cadieu, C.F., et al.: Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS Comput. Biol. 10(12), e1003963 (2014)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Crick, F.: The recent excitement about neural networks. Nature 337(6203), 129–132 (1989)
Article Google Scholar
Olshausen, B.A., Anderson, C.H., Van Essen, D.C.: A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700–4719 (1993)
Article Google Scholar
Shahroudnejad, A., Afshar, P., Plataniotis, K.N., Mohammadi, A.: Improved explainability of capsule networks: relevance path by agreement. In: 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, pp. 549–553 (2018)
Google Scholar
Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446 (2017)
Google Scholar

Download references

Acknowledgments

We would like to thank Sebastian Stabinger for his useful comments and Prof. Anitha Pasupathy for providing the program to create the single shape stimuli.

Author information

Authors and Affiliations

Department of Computer Science, University of Innsbruck, Innsbruck, Austria
Antonio Rodríguez-Sánchez & Tobias Dick

Authors

Antonio Rodríguez-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Dick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Rodríguez-Sánchez .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodríguez-Sánchez, A., Dick, T. (2019). Capsule Networks for Attention Under Occlusion. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions. ICANN 2019. Lecture Notes in Computer Science(), vol 11731. Springer, Cham. https://doi.org/10.1007/978-3-030-30493-5_50

Download citation

DOI: https://doi.org/10.1007/978-3-030-30493-5_50
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30492-8
Online ISBN: 978-3-030-30493-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics