Abstract
Despite their achievements in object recognition, Convolutional Neural Networks (CNNs) particularly fail to generalize to unseen viewpoints of a learned object even with substantial samples. On the other hand, recently emerged capsule networks outperform CNNs in novel viewpoint generalization tasks even with significantly fewer parameters. Capsule networks group the neuron activations for representing higher level attributes and their interactions for achieving equivariance to visual transformations. However, capsule networks have a high computational cost for learning the interactions of capsules in consecutive layers via the, so called, routing algorithm. To address these issues, we propose a novel routing algorithm, Alleviated Pose Attentive Capsule Agreement (ALPACA) which is tailored for capsules that contain pose, feature and existence probability information together to enhance novel viewpoint generalization of capsules on 2D images. For this purpose, we have created a Novel ViewPoint Dataset (NVPD) a viewpoint-controlled texture-free dataset that has 8 different setups where training and test samples are formed by different viewpoints. In addition to NVPD, we have conducted experiments on iLab2M dataset where the dataset is split in terms of the object instances. Experimental results show that ALPACA outperforms its capsule network counterparts and state-of-the-art CNNs on iLab2M and NVPD datasets. Moreover, ALPACA is 10 times faster when compared to routing-based capsule networks. It also outperforms attention-based routing algorithms of the domain while keeping the inference and training times comparable. Lastly, our code, the NVPD dataset, test setups, and implemented models are freely available at https://github.com/Boazrciasn/ALPACA.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data Availability
Code and the dataset used in this research is freely available at https://github.com/Boazrciasn/ALPACA.
References
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 45:91–99
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Alcorn MA, Li Q, Gong Z, Wang C, Mai L, Ku W-S, Nguyen A (2019) Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4845–4854
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Adv Neural Inf Process Syst 30:3856–3866
Hinton GE, Frosst N, Sabour S (2018) Matrix capsules with EM routing. In: International Conference on Learning Representations (ICLR)
Ribeiro FDS, Leontidis G, Kollias S (2020) Capsule routing via variational bayes. Proc AAAI Conf Artif Intell 34:3749–3756
Peer D, Stabinger S, Rodríguez-Sánchez A (2021) Limitation of capsule networks. Pattern Recogn Lett 144:68–74
Zhao Y, Birdal T, Lenssen JE, Menegatti E, Guibas L, Tombari F (2020) Quaternion equivariant capsule networks for 3d point clouds. In: European Conference on Computer Vision, pp. 1–19. Springer
Özcan B, Kinli F, Kiraç F (2021) Quaternion capsule networks. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6858–6865. IEEE
Leksut JT, Zhao J, Itti L (2020) Learning visual variation for object recognition. Image Vision Comput 98:103912
Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: Proceedings of the 21th International Conference on Artificial Neural Networks. ICANN’11, pp. 44–51
Lecun Y, Huang F, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 97. https://doi.org/10.1109/CVPR.2004.1315150
LaLonde R, Bagci U (2018) Capsules for object segmentation. http://arxiv.org/abs/1804.04241
Koresh HJD, Chacko S, Periyanayagi M (2021) A modified capsule network algorithm for oct corneal image segmentation. Pattern Recogn Lett 143:104–112
Duarte K, Rawat Y, Shah M (2018) VideoCapsuleNet: a simplified network for action detection. Adv Neural Inf Process Syst 31:7610–7619
Kinli F, Ozcan B, Kirac F (2019) Fashion image retrieval with capsule networks. In: The IEEE International Conference on Computer Vision (ICCV) Workshops
Kınlı, F., Kıraç, F (2020) Fashioncapsnet: Clothing classification with capsule networks. Bilişim Teknolojileri Dergisi 13: 87–96 . doi: https://doi.org/10.17671/gazibtd.580222
Nguyen HH, Yamagishi J, Echizen I (2019)Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2307–2311. IEEE
Paoletti ME, Haut JM, Fernandez-Beltran R, Plaza J, Plaza A, Li J, Pla F (2018) Capsule networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 57(4):2145–2160
Yang M, Zhao W, Ye J, Lei Z, Zhao Z, Zhang S(2018) Investigating capsule networks with dynamic routing for text classification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3110–3119
Wang M, Xie J, Tan Z, Su J, Xiong D, Li L (2018) Towards linear time neural machine translation with capsule networks. http://arxiv.org/abs/1811.00287
Zhao W, Peng H, Eger S, Cambria E, Yang M (2019) Towards scalable and reliable capsule networks for challenging nlp applications. http://arxiv.org/abs/1906.02829
Hirose A (2004) Complex-valued neural networks: theories and applications (series on innovative intelligence, 5)
Zimmermann HG, Minin A, Kusherbaeva V (2011) Comparison of the complex valued and real valued neural networks trained with gradient descent and random search algorithms. In: Proc. of ESANN 2011
Nitta T (2002) On the critical points of the complex-valued neural network. In: Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02, vol. 3, pp. 1099–1103. IEEE
Hirose A, Yoshida S (2012) Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence. IEEE Trans Neural Netw Learn Syst 23:541–551
Danihelka I, Wayne G, Uria B, Kalchbrenner N, Graves A (2016) Associative long short-term memory. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 48, pp. 1986–1994
Arjovsky M, Shah A, Bengio Y (2016) Unitary evolution recurrent neural networks. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. ICML’16, pp. 1120–1128
Gaudet CJ, Maida AS (2018) Deep quaternion networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE
Zhu X, Xu Y, Xu H, Chen C (2018) Quaternion convolutional neural networks. In: The European Conference on Computer Vision (ECCV)
Parcollet T, Zhang Y, Morchid M, Trabelsi C, Linarès G, De Mori R, Bengio Y (2018) Quaternion convolutional neural networks for end-to-end automatic speech recognition. https://doi.org/10.21437/Interspeech.2018-1898
Ribeiro FDS, Leontidis G, Kollias SD (2020) Introducing routing uncertainty in capsule networks. In: NeurIPS
Hahn T, Pyeon M, Kim G (2019) Self-routing capsule networks. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems. Curran Associates Inc, USA
Choi J, Seo H, Im S, Kang M (2019) Attention routing between capsules. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0
Tsai Y-HH, Srivastava N, Goh H, Salakhutdinov R (2020) Capsules with inverted dot-product attention routing. http://arxiv.org/abs/2002.04764
Ahmed K, Torresani L (2019) Star-caps: Capsule networks with straight-through attentive routing. In: NeurIPS, pp. 9098–9107
Yu Z-X, He Y, Zhu C, Tian S, Yin X-C (2019) Carnet: Densely connected capsules with capsule-wise attention routing. In: Cyberspace Data and Intelligence, and Cyber-Living, Syndrome, and Health, pp. 309–320. Springer
Parcollet, T, Ravanelli, M, Morchid, M, Linarès, G, Trabelsi, C, De Mori, R, Bengio, Y.: Quaternion Recurrent Neural Networks. In: International Conference on Learning Representations (ICLR) (2019)
Laue, S, Mitterreiter, M, Giesen, J.: Computing higher order derivatives of matrix and tensor expressions. In: NeurIPS, pp. 2755–2764 (2018)
Jablonski B (2008) Anisotropic filtering of multidimensional rotational trajectories as a generalization of 2d diffusion process. Multidimens Syst Signal Process 19(3–4):379–399
Jabłoński, B.: Application of quaternion scale space approach for motion processing. In: Image Processing and Communications Challenges 3, pp. 141–148. Springer, (2011)
Nair, V, Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814. Omnipress, (2010)
Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, A.N, Kaiser, L, Polosukhin, I.: Attention is all you need. http://arxiv.org/abs/1706.03762 (2017)
He, K, Zhang, X, Ren, S, Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645 (2016). Springer
Kinli FO, Kiraç FM (2020) Fashioncapsnet: clothing classification with capsule networks. Bilişim Teknolojileri Dergisi 13(1):87–96
He, K, Zhang, X, Ren, S, Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Iandola, F, Moskewicz, M, Karayev, S, Girshick, R, Darrell, T, Keutzer, K.: Densenet: Implementing efficient convnet descriptor pyramids. http://arxiv.org/abs/1404.1869 (2014)
Iandola, F.N, Han, S, Moskewicz, M.W, Ashraf, K, Dally, W.J, Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. http://arxiv.org/abs/1602.07360 (2016)
Chang, A.X, Funkhouser, T, Guibas, L, Hanrahan, P, Huang, Q, Li, Z, Savarese, S, Savva, M, Song, S, Su, H, Xiao, J, Yi, L, Yu, F.: ShapeNet: An Information-Rich 3D Model Repository. Technical Report http://arxiv.org/abs/1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago (2015)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Özcan, B., Kınlı, F. & Kıraç, F. Generalization to unseen viewpoint images of objects via alleviated pose attentive capsule agreement. Neural Comput & Applic 35, 3521–3536 (2023). https://doi.org/10.1007/s00521-022-07900-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07900-3