Abstract
For the deep convolutional neural network, the input image needs to be fixed to the corresponding size due to the existence of the fully connected layer. Stretching and clipping can make the image reach the required size, but these operations can easily distort the image. The methods based on pooling layer groups enable variable-size feature maps to be converted into fixed-size. However, there is a loss of information due to the pooling operations, and the recognition accuracy will be significantly reduced. Based on this problem, we propose the shared conversion matrix routing (SCMR) layer as a general network layer to replace the fully connected layer of the convolutional neural network, which can enable the network added to this layer to deal with multi-scale image problems without changing the original convolution structure and parameters. In the SCMR layer, we propose a RECOMBINATION method which dynamically increases or decreases the number of capsules according to the scale of the input image to ensure the normal operation of the convolutional layer and the SCMR layer. At the same time, a new dynamic routing algorithm is established by sharing the transformation matrix in the SCMR layer so that the SCMR layer can receive the convolutional multi-dimensional feature map and generate fixed-size image features output to realize the classification of multi-scale images. The algorithm makes each capsule have a corresponding weight to avoid the problem of feature loss, which improves the recognition rate. In addition, new capsules are created by increasing the dimensions of capsules to solve the exploding gradient problem in backpropagation. The experimental results show that the accuracy of the method proposed in this paper is better than the modern methods on public datasets.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in Neural Information Processing Systems 2:2672–2680
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:151106434
Tran QN, Yang SH (2020) Efficient video frame interpolation using generative adversarial networks. Appl Sci 10(18):6245. https://doi.org/10.3390/app10186245
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824
Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems, In, pp 379–387
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: European Conference on Computer Vision (ECCV), pp 784-799. https://doi.org/10.1007/978-3-030-01264-9_48
Wang S, Liu Y, He Z, Wang Y, Tang Z (2020) A quadrilateral scene text detector with two-stage network architecture. Pattern Recogn 102:107230. https://doi.org/10.1016/j.patcog.2020.107230
Han X, He T, Ong YS, Zhong Y (2020) Precise object detection using adversarially augmented local/global feature fusion. Eng Appl Artif Intell 94:103710. https://doi.org/10.1016/j.engappai.2020.103710
Wang F, Xu Z, Gan Y, Vong CM, Liu Q (2020) SCNet: scale-aware coupling-structure network for efficient video object detection. Neurocomputing 404:283–293. https://doi.org/10.1016/j.neucom.2020.03.110
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Advances in Neural Information Processing Systems, In, pp 3859–3869
Khan A, Zubair S (2020) Expansion of regularized kmeans discretization machine learning approach in prognosis of dementia progression. In: International Conference on Computing. Communication and Networking Technologies (ICCCNT), IEEE, pp 1–6. https://doi.org/10.1109/ICCCNT49239.2020.9225397
Li X, Zhang R, Wang Q, Zhang H (2020) Autoencoder constrained clustering with adaptive neighbors. IEEE Trans Neural Netw Learn Syst 32(1):443–449. https://doi.org/10.1109/TNNLS.2020.2978389
Yang X, Deng C, Zheng F, Yan J, Liu W (2019) Deep spectral clustering using dual autoencoder network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 4066–4075. https://doi.org/10.1109/cvpr.2019.00419
Zhang B, Qian J (2021) Autoencoder-based unsupervised clustering and hashing. Appl Intell 51(1):493–505. https://doi.org/10.1007/s10489-020-01797-y
Ghasedi K, Wang X, Deng C, Huang H (2019) Balanced self-paced learning for generative adversarial clustering network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 4391–4400. https://doi.org/10.1109/cvpr.2019.00452
Zhou R, Shen YD (2020) End-to-end adversarial-attention network for multi-modal clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 14619–14628. https://doi.org/10.1109/cvpr42600.2020.01463
Mittal H, Pandey AC, Pal R, Tripathi A (2021) A new clustering method for the diagnosis of CoVID19 using medical images. Appl Intell 51(5):2988–3011. https://doi.org/10.1007/s10489-020-02122-3
Qi C, Zhang J, Jia H, Mao Q, Wang L, Song H (2021) Deep face clustering using residual graph convolutional network. Knowledge-Based Syst 211:106561. https://doi.org/10.1016/j.knosys.2020.106561
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90. https://doi.org/10.1145/3065386
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision. Springer, pp 818-833. https://doi.org/10.1007/978-3-319-10590-1_53
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. International conference on machine learning. PMLR, In, pp 448–456
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: proceedings of the AAAI conference on artificial intelligence, vol 31
Zhu M, Jiao L, Liu F, Yang S, Wang J (2020) Residual spectral–spatial attention network for hyperspectral image classification. IEEE Trans Geosci Remote Sensing 59(1):449–462. https://doi.org/10.1109/TGRS.2020.2994057
Tong W, Chen W, Han W, Li X, Wang L (2020) Channel-attention-based DenseNet network for remote sensing image scene classification. IEEE J Sel Top Appl Earth Observ Remote Sens 13:4121–4132. https://doi.org/10.1109/JSTARS.2020.3009352
Lu Z, Xu B, Sun L, Zhan T, Tang S (2020) 3-D channel and spatial attention based multiscale spatial–spectral residual network for hyperspectral image classification. IEEE J Sel Top Appl Earth Observ Remote Sens 13:4311–4324. https://doi.org/10.1109/JSTARS.2020.3011992
Zoran D, Chrzanowski M, Huang PS, Gowal S, Mott A, Kohli P (2020) Towards robust image classification using sequential attention models. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 9483–9492. https://doi.org/10.1109/CVPR42600.2020.00950
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2017) Towards deep learning models resistant to adversarial attacks. arXiv:170606083
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp. 630–645. https://doi.org/10.1007/978-3-319-46493-0_38
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1492–1500. https://doi.org/10.1109/CVPR.2017.634
Cao X, Yao J, Xu Z, Meng D (2020) Hyperspectral image classification with convolutional neural network and active learning. IEEE Trans Geosci Remote Sensing 58(7):4604–4616. https://doi.org/10.1109/TGRS.2020.2964627
Yang H, Song K, Mao F, Yin Z (2020) Autolabeling-enhanced active learning for cost-efficient surface defect visual classification. IEEE Trans Instrum Meas 70:1–15. https://doi.org/10.1109/TIM.2020.3032190
Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. International conference on artificial neural networks. Springer, pp 44-51. https://doi.org/10.1007/978-3-642-21735-7_6
Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with EM routing. International conference on learning representations, In
Bahadori MT (2018) Spectral capsule networks. International conference on learning representations, In
Wang D, Liu Q (2018) An optimization view on dynamic routing between capsules. International conference on learning representations, In
Phaye SSR, Sikka A, Dhall A, Bathula D (2018) Dense and diverse capsule networks: making the capsules learn better. arXiv:180504001
Xi E, Bing S, Jin Y (2017) Capsule network performance on complex data. arXiv:171203480
Deliege A, Cioppa A, Van Droogenbroeck M (2018) Hitnet: a neural network with capsules embedded in a hit-or-miss layer, extended with hybrid data augmentation and ghost capsules. arXiv:180606519
Neill JO (2018) Siamese capsule networks. arXiv:180507242
Sahu SK, Kumar P, Singh AP (2018) Dynamic routing using inter capsule routing protocol between capsules. In: 2018 UKSim-AMSS 20th international conference on computer modelling and simulation (UKSim). IEEE, pp 1-5. https://doi.org/10.1109/UKSim.2018.00012
Lenssen JE, Fey M, Libuschewski P (2018) Group equivariant capsule networks. In: Advances in Neural Information Processing Systems, vol 31
Qiao K, Zhang C, Wang L, Chen J, Zeng L, Tong L, Yan B (2018) Accurate reconstruction of image stimuli from human functional magnetic resonance imaging based on the decoding model with capsule network architecture. Front neuroinformatics 12:62. https://doi.org/10.3389/fninf.2018.00062
Afshar P, Mohammadi A, Plataniotis KN (2018) Brain tumor type classification via capsule networks. In: IEEE international conference on image processing (ICIP). IEEE, pp 3129-3133. https://doi.org/10.1109/ICIP.2018.8451379
Iesmantas T, Alzbutas R (2018) Convolutional capsule network for classification of breast cancer histology images. In: International Conference Image Analysis and Recognition. Springer, pp. 853–860. https://doi.org/10.1007/978-3-319-93000-8_97
Kosiorek AR, Sabour S, Teh YW, Hinton GE (2019) Stacked capsule autoencoders. Advances in Neural Information Processing Systems, In, pp 15512–15522
Duarte K, Rawat YS, Shah M (2018) VideoCapsuleNet: a simplified network for action detection. Advances in Neural Information Processing Systems, In, pp 7610–7619
Pugeault N, Bowden R (2011) Spelling it out: real-time ASL fingerspelling recognition. In: IEEE International conference on computer vision workshops (ICCV workshops). IEEE, pp 1114-1119. https://doi.org/10.1109/ICCVW.2011.6130290
Acknowledgements
This work was supported by the Natural Science Foundation of Hebei Province (NO. F2018201060) and the Post-graduate’s Innovation Fund Project of Hebei University (HBU2021ss059).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Y., Li, K. & Lei, Y. A general multi-scale image classification based on shared conversion matrix routing. Appl Intell 52, 3249–3265 (2022). https://doi.org/10.1007/s10489-021-02558-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02558-1