CapsNet vs CNN: Analysis of the Effects of Varying Feature Spatial Arrangement

Manogaran, Ugenteraan; Wong, Ya Ping; Ng, Boon Yian

doi:10.1007/978-3-030-55187-2_1

Ugenteraan Manogaran¹⁷,
Ya Ping Wong¹⁷ &
Boon Yian Ng¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1251))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

993 Accesses
2 Citations

Abstract

Despite the success over the recent years, convolutional neural network (CNN) has a major limitation of the inability to retain spatial relationship between learned features in deeper layers. Capsule network with dynamic routing (CapsNet) was introduced in 2017 with a speculation that CapsNet can overcome this limitation. In our research, we created a suitable collection of datasets and implemented a simple CNN model and a CapsNet model with similar complexity to test this speculation. Experimental results show that both the implemented CNN and CapsNet models have the ability to capture the spatial relationship between learned features. Counterintuitively, our experiments show that our CNN model outperforms our CapsNet model using our datasets. This implies that the speculation does not seem to be entirely correct. This might be due to the fact that our datasets are too simple, hence requiring a simple CNN model. We further recommend future research to be conducted using deeper models and more complex datasets to test the speculation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/MMU-VisionLab/CapsNet-vs-CNN.

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Nair, P., Doshi, R., Keselj, S.: Pushing the limits of capsule networks. Technical note (2018)
Google Scholar
Algamdi, A.M., Sanchez, V., Li, C.T.: Learning temporal information from spatial information using CapsNets for human action recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 3867–3871 (2019)
Google Scholar
Xi, E., Bing, S., Jin, Y.: Capsule network performance on complex data. arXiv preprint arXiv:1712.03480 (2017)
Xiang, C., Zhang, L., Tang, Y., Zou, W., Xu, C.: MS-CapsNet: a novel multi-scale capsule network. IEEE Signal Process. Lett. 25(12), 1850–1854 (2018)
Article Google Scholar
Chidester, B., Do, M.N., Ma, J.: Rotation equivariance and invariance in convolutional neural networks. arXiv preprint arXiv:1805.12301 (2018)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)
Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Palaz, D., Magimai-Doss, M., Collobert, R.: Analysis of CNN-based speech recognition system using raw speech as input. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Zhang, C., Liu, W., Ma, H., Fu, H.: Siamese neural network based gait recognition for human identification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2832–2836 (2016)
Google Scholar
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Zhang, X.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Tzelepi, M., Tefas, A.: Human crowd detection for drone flight safety using convolutional neural networks. In: 25th European Signal Processing Conference (EUSIPCO), pp. 743–747. IEEE (2017)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: IEEE Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016)
Google Scholar
Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Proceedings of the 21th International Conference on Artificial Neural Networks-Volume Part I, pp. 44–51 (2011)
Google Scholar
LaLonde, R., Bagci, U.: Capsules for object segmentation. arXiv preprint arXiv:1804.04241 (2018)

Download references

Acknowledgment

The authors are grateful to the Ministry of Higher Education, Malaysia and Multimedia University for the financial support provided by the Fundamental Research Grant Scheme (MMUE/150030) and MMU Internal Grant Scheme (MMUI/170110).

Author information

Authors and Affiliations

Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100, Cyberjaya, Selangor, Malaysia
Ugenteraan Manogaran, Ya Ping Wong & Boon Yian Ng

Authors

Ugenteraan Manogaran
View author publications
You can also search for this author in PubMed Google Scholar
Ya Ping Wong
View author publications
You can also search for this author in PubMed Google Scholar
Boon Yian Ng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ugenteraan Manogaran .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manogaran, U., Wong, Y.P., Ng, B.Y. (2021). CapsNet vs CNN: Analysis of the Effects of Varying Feature Spatial Arrangement. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-55187-2_1
Published: 25 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55186-5
Online ISBN: 978-3-030-55187-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics