Abstract
Capsule network (CapsNet) was introduced as an enhancement over convolutional neural networks, supplementing the latter’s invariance properties with equivariance through pose estimation. CapsNet achieved a very decent performance with a shallow architecture and a significant reduction in parameters count. However, the width of the first layer in CapsNet is still contributing to a significant number of its parameters and the shallowness may be limiting the representational power of the capsules. To address these limitations, we introduce Path Capsule Network (PathCapsNet), a deep parallel multi-path version of CapsNet. We show that a judicious coordination of depth, max-pooling, regularization by DropCircuit and a new fan-in routing by agreement technique can achieve better or comparable results to CapsNet, while further reducing the parameter count significantly.
Similar content being viewed by others
References
Bender G, Kindermans PJ, Zoph B, Vasudevan V, Le Q (2018) Understanding and simplifying one-shot architecture search. http://proceedings.mlr.press/v80/bender18a
Chollet F (2016) Xception: deep learning with depthwise separable convolutions. arXiv:1610.02357
Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification arXiv:1202.2745
Fukushima K, Miyake S (1980) Neocognitron: Self-organizing network capable of position-invariant recognition of patterns. In: Proceedings of the 5th international conference pattern recognition, vol 1, pp 459–461
Ghafoorian M, Karssemeijer N, Heskes T, van Uden IWM, Sanchez CI, Litjens G, de Leeuw FE, van Ginneken B, Marchiori E, Platel B (2017) Location sensitive deep convolutional neural networks for segmentation of white matter hyperintensities. Sci Rep 7(1):5110. https://doi.org/10.1038/s41598-017-05300-5
Gollisch T, Meister M (2010) Eye smarter than scientists believed: neural computations in circuits of the retina. Neuron 65(2):150–64. https://doi.org/10.1016/j.neuron.2009.12.009
He C, Peng L, Le Y, He J (2018) SECaps: a sequence enhanced capsule model for charge prediction. arXiv:1810.04465
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with {EM} routing. In: International conference on learning representations. https://openreview.net/forum?id=HJWLfGWRb
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670. https://doi.org/10.1109/TIP.2015.2487860
Hong C, Yu J, Zhang J, Jin X, Lee KH (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961. https://doi.org/10.1109/TII.2018.2884211
Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560. https://doi.org/10.1016/0893-6080(90)90005-6
Hubel D, Wiesel T (1968) Receptive fields and functional architecture of monkey striate cortex. J Phys. https://doi.org/10.1113/jphysiol.1968.sp008455
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
Larsson G, Maire M, Shakhnarovich G (2016) FractalNet: ultra-deep neural networks without residuals. arXiv:1605.07648
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 4:541–551. https://doi.org/10.1162/neco.1989.1.4.541
LeCun Y, Bengio Y et al. (1995) Convolutional networks for images, speech, and time series. In: The handbook of brain theory and neural networks, vol 3361(10)
Mehrer J, Spoerer CJ, Kriegeskorte N, Kietzmann TC (2020) Individual differences among deep neural network models. bioRxiv. https://doi.org/10.1101/2020.01.08.898288
Neill JO (2018) Siamese capsule networks. arXiv:1805.07242
Otsuna H, Shinomiya K, Ito K (2014) Parallel neural pathways in higher visual centers of the Drosophila brain that mediate wavelength-specific behavior. Front Neural Circuits 8:8. https://doi.org/10.3389/fncir.2014.00008
Phan KT, Maul TH, Vu TT, Kin LW (2016) Improving neural network generalization by combining parallel circuits with dropout. Comput Sci. https://doi.org/10.1007/978-3-319-46675-0. arXiv:1612.04970
Phan KT, Maul TH, Vu TT, Lai WK (2018) DropCircuit: a modular regularizer for parallel circuit networks. Neural Process Lett 47(3):841–858. https://doi.org/10.1007/s11063-017-9677-4
Phaye SSR, Sikka A, Dhall A, Bathula D (2018) Dense and diverse capsule networks: making the capsules learn better arXiv:1805.04001
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. arXiv:1710.09829
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958. https://doi.org/10.1214/12-AOS1000. arXiv:1102.4807
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 07–12 June, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594. arXiv:1409.4842
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. arXiv:1512.00567
Taigman Y, Yang M, Ranzato M, Wolf L (2014) DeepFace: closing the gap to human-level performance in face verification. https://www.cv-foundation.org/openaccess/content_cvpr_2014/html/Taigman_DeepFace_Closing_the_2014_CVPR_paper.html
Tang K, Paluri M, Fei-Fei L, Fergus R, Bourdev L (2015) Improving image classification with location context arXiv:1505.03873
Wang M (2015) Multi-path Convolutional neural networks for complex image classification. arXiv:1506.04701
Wang Z, Veksler O (2018) Location augmentation for CNN. arXiv:1807.07044
Xiang C, Zhang L, Tang Y, Zou W, Xu C (2018) MS-CapsNet: a novel multi-scale capsule network. IEEE Signal Process Lett 25(12):1850–1854. https://doi.org/10.1109/LSP.2018.2873892
Xie S, Girshick R, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:161105431
Yu J, Li J, Yu Z, Huang Q (2019) Multimodal transformer with multi-view visual representation for image captioning. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/tcsvt.2019.2947482. arXiv:1905.07841
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2019.2932058
Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Tran Image Process 27(5):2420–2432. https://doi.org/10.1109/TIP.2018.2804218
Zhang X, Huang S, Zhang X, Wang W, Wang Q, Yang D (2018) Residual inception: a new module combining modified residual with inception to improve network performance. In: 2018 25th IEEE international conference on image processing (ICIP), IEEE, pp 3039–3043. https://doi.org/10.1109/ICIP.2018.8451515
Acknowledgements
We acknowledge the use of Athena at HPC Midlands+, which was funded by the EPSRC on Grant EP/P020232/1, in this research, as part of the HPC Midlands+ consortium. This work was partially supported by a Grant from Microsoft’s AI for Earth program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Amer, M., Maul, T. Path Capsule Networks. Neural Process Lett 52, 545–559 (2020). https://doi.org/10.1007/s11063-020-10273-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-020-10273-0