Abstract
Despite the success over the recent years, convolutional neural network (CNN) has a major limitation of the inability to retain spatial relationship between learned features in deeper layers. Capsule network with dynamic routing (CapsNet) was introduced in 2017 with a speculation that CapsNet can overcome this limitation. In our research, we created a suitable collection of datasets and implemented a simple CNN model and a CapsNet model with similar complexity to test this speculation. Experimental results show that both the implemented CNN and CapsNet models have the ability to capture the spatial relationship between learned features. Counterintuitively, our experiments show that our CNN model outperforms our CapsNet model using our datasets. This implies that the speculation does not seem to be entirely correct. This might be due to the fact that our datasets are too simple, hence requiring a simple CNN model. We further recommend future research to be conducted using deeper models and more complex datasets to test the speculation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Nair, P., Doshi, R., Keselj, S.: Pushing the limits of capsule networks. Technical note (2018)
Algamdi, A.M., Sanchez, V., Li, C.T.: Learning temporal information from spatial information using CapsNets for human action recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 3867–3871 (2019)
Xi, E., Bing, S., Jin, Y.: Capsule network performance on complex data. arXiv preprint arXiv:1712.03480 (2017)
Xiang, C., Zhang, L., Tang, Y., Zou, W., Xu, C.: MS-CapsNet: a novel multi-scale capsule network. IEEE Signal Process. Lett. 25(12), 1850–1854 (2018)
Chidester, B., Do, M.N., Ma, J.: Rotation equivariance and invariance in convolutional neural networks. arXiv preprint arXiv:1805.12301 (2018)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Palaz, D., Magimai-Doss, M., Collobert, R.: Analysis of CNN-based speech recognition system using raw speech as input. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Zhang, C., Liu, W., Ma, H., Fu, H.: Siamese neural network based gait recognition for human identification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2832–2836 (2016)
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Zhang, X.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Tzelepi, M., Tefas, A.: Human crowd detection for drone flight safety using convolutional neural networks. In: 25th European Signal Processing Conference (EUSIPCO), pp. 743–747. IEEE (2017)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: IEEE Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016)
Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Proceedings of the 21th International Conference on Artificial Neural Networks-Volume Part I, pp. 44–51 (2011)
LaLonde, R., Bagci, U.: Capsules for object segmentation. arXiv preprint arXiv:1804.04241 (2018)
Acknowledgment
The authors are grateful to the Ministry of Higher Education, Malaysia and Multimedia University for the financial support provided by the Fundamental Research Grant Scheme (MMUE/150030) and MMU Internal Grant Scheme (MMUI/170110).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Manogaran, U., Wong, Y.P., Ng, B.Y. (2021). CapsNet vs CNN: Analysis of the Effects of Varying Feature Spatial Arrangement. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-55187-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55186-5
Online ISBN: 978-3-030-55187-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)