Skip to main content
Log in

A general multi-scale image classification based on shared conversion matrix routing

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

For the deep convolutional neural network, the input image needs to be fixed to the corresponding size due to the existence of the fully connected layer. Stretching and clipping can make the image reach the required size, but these operations can easily distort the image. The methods based on pooling layer groups enable variable-size feature maps to be converted into fixed-size. However, there is a loss of information due to the pooling operations, and the recognition accuracy will be significantly reduced. Based on this problem, we propose the shared conversion matrix routing (SCMR) layer as a general network layer to replace the fully connected layer of the convolutional neural network, which can enable the network added to this layer to deal with multi-scale image problems without changing the original convolution structure and parameters. In the SCMR layer, we propose a RECOMBINATION method which dynamically increases or decreases the number of capsules according to the scale of the input image to ensure the normal operation of the convolutional layer and the SCMR layer. At the same time, a new dynamic routing algorithm is established by sharing the transformation matrix in the SCMR layer so that the SCMR layer can receive the convolutional multi-dimensional feature map and generate fixed-size image features output to realize the classification of multi-scale images. The algorithm makes each capsule have a corresponding weight to avoid the problem of feature loss, which improves the recognition rate. In addition, new capsules are created by increasing the dimensions of capsules to solve the exploding gradient problem in backpropagation. The experimental results show that the accuracy of the method proposed in this paper is better than the modern methods on public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in Neural Information Processing Systems 2:2672–2680

    Google Scholar 

  2. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:151106434

  3. Tran QN, Yang SH (2020) Efficient video frame interpolation using generative adversarial networks. Appl Sci 10(18):6245. https://doi.org/10.3390/app10186245

    Article  Google Scholar 

  4. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824

    Article  Google Scholar 

  5. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  6. He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175

    Article  Google Scholar 

  7. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems, In, pp 379–387

    Google Scholar 

  8. Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: European Conference on Computer Vision (ECCV), pp 784-799. https://doi.org/10.1007/978-3-030-01264-9_48

  9. Wang S, Liu Y, He Z, Wang Y, Tang Z (2020) A quadrilateral scene text detector with two-stage network architecture. Pattern Recogn 102:107230. https://doi.org/10.1016/j.patcog.2020.107230

    Article  Google Scholar 

  10. Han X, He T, Ong YS, Zhong Y (2020) Precise object detection using adversarially augmented local/global feature fusion. Eng Appl Artif Intell 94:103710. https://doi.org/10.1016/j.engappai.2020.103710

    Article  Google Scholar 

  11. Wang F, Xu Z, Gan Y, Vong CM, Liu Q (2020) SCNet: scale-aware coupling-structure network for efficient video object detection. Neurocomputing 404:283–293. https://doi.org/10.1016/j.neucom.2020.03.110

    Article  Google Scholar 

  12. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Advances in Neural Information Processing Systems, In, pp 3859–3869

    Google Scholar 

  13. Khan A, Zubair S (2020) Expansion of regularized kmeans discretization machine learning approach in prognosis of dementia progression. In: International Conference on Computing. Communication and Networking Technologies (ICCCNT), IEEE, pp 1–6. https://doi.org/10.1109/ICCCNT49239.2020.9225397

    Chapter  Google Scholar 

  14. Li X, Zhang R, Wang Q, Zhang H (2020) Autoencoder constrained clustering with adaptive neighbors. IEEE Trans Neural Netw Learn Syst 32(1):443–449. https://doi.org/10.1109/TNNLS.2020.2978389

    Article  Google Scholar 

  15. Yang X, Deng C, Zheng F, Yan J, Liu W (2019) Deep spectral clustering using dual autoencoder network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 4066–4075. https://doi.org/10.1109/cvpr.2019.00419

    Book  Google Scholar 

  16. Zhang B, Qian J (2021) Autoencoder-based unsupervised clustering and hashing. Appl Intell 51(1):493–505. https://doi.org/10.1007/s10489-020-01797-y

    Article  MathSciNet  Google Scholar 

  17. Ghasedi K, Wang X, Deng C, Huang H (2019) Balanced self-paced learning for generative adversarial clustering network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 4391–4400. https://doi.org/10.1109/cvpr.2019.00452

    Book  Google Scholar 

  18. Zhou R, Shen YD (2020) End-to-end adversarial-attention network for multi-modal clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 14619–14628. https://doi.org/10.1109/cvpr42600.2020.01463

    Book  Google Scholar 

  19. Mittal H, Pandey AC, Pal R, Tripathi A (2021) A new clustering method for the diagnosis of CoVID19 using medical images. Appl Intell 51(5):2988–3011. https://doi.org/10.1007/s10489-020-02122-3

    Article  Google Scholar 

  20. Qi C, Zhang J, Jia H, Mao Q, Wang L, Song H (2021) Deep face clustering using residual graph convolutional network. Knowledge-Based Syst 211:106561. https://doi.org/10.1016/j.knosys.2020.106561

    Article  Google Scholar 

  21. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  22. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision. Springer, pp 818-833. https://doi.org/10.1007/978-3-319-10590-1_53

  23. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556

  24. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594

    Book  Google Scholar 

  25. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. International conference on machine learning. PMLR, In, pp 448–456

    Google Scholar 

  26. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

    Book  Google Scholar 

  27. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: proceedings of the AAAI conference on artificial intelligence, vol 31

  28. Zhu M, Jiao L, Liu F, Yang S, Wang J (2020) Residual spectral–spatial attention network for hyperspectral image classification. IEEE Trans Geosci Remote Sensing 59(1):449–462. https://doi.org/10.1109/TGRS.2020.2994057

    Article  Google Scholar 

  29. Tong W, Chen W, Han W, Li X, Wang L (2020) Channel-attention-based DenseNet network for remote sensing image scene classification. IEEE J Sel Top Appl Earth Observ Remote Sens 13:4121–4132. https://doi.org/10.1109/JSTARS.2020.3009352

    Article  Google Scholar 

  30. Lu Z, Xu B, Sun L, Zhan T, Tang S (2020) 3-D channel and spatial attention based multiscale spatial–spectral residual network for hyperspectral image classification. IEEE J Sel Top Appl Earth Observ Remote Sens 13:4311–4324. https://doi.org/10.1109/JSTARS.2020.3011992

    Article  Google Scholar 

  31. Zoran D, Chrzanowski M, Huang PS, Gowal S, Mott A, Kohli P (2020) Towards robust image classification using sequential attention models. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 9483–9492. https://doi.org/10.1109/CVPR42600.2020.00950

    Book  Google Scholar 

  32. Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2017) Towards deep learning models resistant to adversarial attacks. arXiv:170606083

  33. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

    Book  Google Scholar 

  34. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp. 630–645. https://doi.org/10.1007/978-3-319-46493-0_38

  35. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1492–1500. https://doi.org/10.1109/CVPR.2017.634

    Book  Google Scholar 

  36. Cao X, Yao J, Xu Z, Meng D (2020) Hyperspectral image classification with convolutional neural network and active learning. IEEE Trans Geosci Remote Sensing 58(7):4604–4616. https://doi.org/10.1109/TGRS.2020.2964627

    Article  Google Scholar 

  37. Yang H, Song K, Mao F, Yin Z (2020) Autolabeling-enhanced active learning for cost-efficient surface defect visual classification. IEEE Trans Instrum Meas 70:1–15. https://doi.org/10.1109/TIM.2020.3032190

    Article  Google Scholar 

  38. Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. International conference on artificial neural networks. Springer, pp 44-51. https://doi.org/10.1007/978-3-642-21735-7_6

  39. Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with EM routing. International conference on learning representations, In

    Google Scholar 

  40. Bahadori MT (2018) Spectral capsule networks. International conference on learning representations, In

    Google Scholar 

  41. Wang D, Liu Q (2018) An optimization view on dynamic routing between capsules. International conference on learning representations, In

    Google Scholar 

  42. Phaye SSR, Sikka A, Dhall A, Bathula D (2018) Dense and diverse capsule networks: making the capsules learn better. arXiv:180504001

  43. Xi E, Bing S, Jin Y (2017) Capsule network performance on complex data. arXiv:171203480

  44. Deliege A, Cioppa A, Van Droogenbroeck M (2018) Hitnet: a neural network with capsules embedded in a hit-or-miss layer, extended with hybrid data augmentation and ghost capsules. arXiv:180606519

  45. Neill JO (2018) Siamese capsule networks. arXiv:180507242

  46. Sahu SK, Kumar P, Singh AP (2018) Dynamic routing using inter capsule routing protocol between capsules. In: 2018 UKSim-AMSS 20th international conference on computer modelling and simulation (UKSim). IEEE, pp 1-5. https://doi.org/10.1109/UKSim.2018.00012

  47. Lenssen JE, Fey M, Libuschewski P (2018) Group equivariant capsule networks. In: Advances in Neural Information Processing Systems, vol 31

  48. Qiao K, Zhang C, Wang L, Chen J, Zeng L, Tong L, Yan B (2018) Accurate reconstruction of image stimuli from human functional magnetic resonance imaging based on the decoding model with capsule network architecture. Front neuroinformatics 12:62. https://doi.org/10.3389/fninf.2018.00062

    Article  Google Scholar 

  49. Afshar P, Mohammadi A, Plataniotis KN (2018) Brain tumor type classification via capsule networks. In: IEEE international conference on image processing (ICIP). IEEE, pp 3129-3133. https://doi.org/10.1109/ICIP.2018.8451379

  50. Iesmantas T, Alzbutas R (2018) Convolutional capsule network for classification of breast cancer histology images. In: International Conference Image Analysis and Recognition. Springer, pp. 853–860. https://doi.org/10.1007/978-3-319-93000-8_97

  51. Kosiorek AR, Sabour S, Teh YW, Hinton GE (2019) Stacked capsule autoencoders. Advances in Neural Information Processing Systems, In, pp 15512–15522

    Google Scholar 

  52. Duarte K, Rawat YS, Shah M (2018) VideoCapsuleNet: a simplified network for action detection. Advances in Neural Information Processing Systems, In, pp 7610–7619

    Google Scholar 

  53. Pugeault N, Bowden R (2011) Spelling it out: real-time ASL fingerspelling recognition. In: IEEE International conference on computer vision workshops (ICCV workshops). IEEE, pp 1114-1119. https://doi.org/10.1109/ICCVW.2011.6130290

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Hebei Province (NO. F2018201060) and the Post-graduate’s Innovation Fund Project of Hebei University (HBU2021ss059).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Li, K. & Lei, Y. A general multi-scale image classification based on shared conversion matrix routing. Appl Intell 52, 3249–3265 (2022). https://doi.org/10.1007/s10489-021-02558-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02558-1

Keywords

Navigation