Skip to main content
Log in

Image classification based on self-distillation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Convolutional neural networks have been widely used in various application scenarios. To extend the application to some areas where accuracy is critical, researchers have been investigating methods to improve accuracy using deeper or broader network structures, which creates exponential growth in computation and storage costs and delays in response time. In this paper, we propose a self-distillation image classification algorithm that significantly improves performance while decreasing training costs. In traditional self-distillation, the student model needs to improve its ability to acquire global information and focus on key features due to the lack of guidance from the teacher model. For this reason, we improved the traditional self-distillation algorithm by using a positional attention module and a residual block with attention. Experimental results show that the method achieves better performance compared with traditional knowledge distillation methods and attention networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Wang X, Chen Z, Yun J (2012) An effective method for color image retrieval based on texture. Comput Stand Interfaces 34(1):31–35

    Article  Google Scholar 

  2. Wang X, Wang Z (2014) The method for image retrieval based on multi-factors correlation utilizing block truncation coding. Pattern Recognit 47(10):3293–3303

    Article  Google Scholar 

  3. Wang X, Wang Z (2013) A novel method for image retrieval based on structure elements’ descriptor. J Vis Commun Image Represent 24(1):63–74

    Article  Google Scholar 

  4. Unar S, Wang X, Zhang C (2018) Visual and textual information fusion using kernel method for content based image retrieval. Inform Fusion 44:176–187

    Article  Google Scholar 

  5. Unar S, Wang X, Wang C, Wang Y (2019) A decisive content based image retrieval approach for feature fusion in visual and textual images. Knowl-Based Syst 179:8–20

    Article  Google Scholar 

  6. Wang C, Wang X, Xia Z, Ma B, Shi Y (2019) Image description with polar harmonic fourier moments. IEEE Trans Circuits Syst Video Technol 30(12):4440–4452

    Article  Google Scholar 

  7. Wang C, Wang X, Xia Z, Zhang C (2019) Ternary radial harmonic fourier moments based robust stereo image zero-watermarking algorithm. Inf Sci 470:109–120

    Article  Google Scholar 

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  9. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  10. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  11. Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Proc Mag 35(1):126– 136

    Article  Google Scholar 

  12. Bashir D, Montanez GD, Sehra S, Segura PS, Lauw J (2020) An information-theoretic perspective on overfitting and underfitting. In: Australasian joint conference on artificial intelligence, Springer, pp 347–358

  13. Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4133–4141

  14. Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: a survey. Int J Comput Vis 129(6):1789–1819

    Article  Google Scholar 

  15. Furlanello T, Lipton Z, Tschannen M, Itti L, Anandkumar A (2018) Born again neural networks. In: International conference on machine learning, PMLR, pp 1607–1616

  16. Yuan L, Tay FE, Li G, Wang T, Feng J (2020) Revisiting knowledge distillation via label smoothing regularization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3903–3911

  17. Ji M, Shin S, Hwang S, Park G, Moon I (2021) Refine myself by teaching myself: feature refinement via self-knowledge distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10664–10673

  18. Zhang L, Song J, Gao A, Chen J, Bao C, Ma K (2019) Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3713–3722

  19. Li T, Jin D, Du C, Cao X, Chen H, Yan J, Chen N, Chen Z, Feng Z, Liu S (2020) The image-based analysis and classification of urine sediments using a lenet-5 neural network. Comput Methods Biomech Biomed Eng Imaging Vis 8(1):109–114

    Article  Google Scholar 

  20. Wang W, Liu Q, Wang W (2021) Pyramid-dilated deep convolutional neural network for crowd counting. Appl Intell:1–13

  21. Zou W, Zhang D, Lee D (2022) A new multi-feature fusion based convolutional neural network for facial expression recognition. Appl Intell:1–12

  22. Yu Q, Kavitha MS, Kurita T (2021) Mixture of experts with convolutional and variational autoencoders for anomaly detection. Appl Intell 51(6):3241–3254

    Article  Google Scholar 

  23. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  24. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  25. Buciluǎ C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 535–541

  26. Hinton G, Vinyals O, Dean J et al (2015) Distilling the knowledge in a neural network. Proc Neural Inform Process Syst, vol 2 (7)

  27. Adriana R, Nicolas B, Ebrahimi KS, Antoine C, Carlo G, Yoshua B (2015) Fitnets: hints for thin deep nets. In: Proceedings of the international conference on learning representations, pp 1–13

  28. Komodakis N, Zagoruyko S (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Proceedings of the international conference on learning representations, pp 1–13

  29. Salehi M, Sadjadi N, Baselizadeh S, Rohban MH, Rabiee HR (2021) Multiresolution knowledge distillation for anomaly detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 14902–14912

  30. Dong N, Zhang Y, Ding M, Xu S, Bai Y (2022) One-stage object detection knowledge distillation via adversarial learning. Appl Intell 52(4):4582–4598

    Article  Google Scholar 

  31. Oyedotun OK, Shabayek AER, Aouada D, Ottersten B (2021) Deep network compression with teacher latent subspace learning and lasso. Appl Intell 51(2):834–853

    Article  Google Scholar 

  32. Yun S, Park J, Lee K, Shin J (2020) Regularizing class-wise predictions via self-knowledge distillation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13876–13885

  33. Cai W, Liu B, Wei Z, Li M, Kan J (2021) Tardb-net: triple-attention guided residual dense and bilstm networks for hyperspectral image classification. Multimed Tools Appl 80(7):11291–11312

    Article  Google Scholar 

  34. Zhang Z, Lin Z, Xu J, Jin W-D, Lu S-P, Fan D-P (2021) Bilateral attention network for rgb-d salient object detection. IEEE Trans Image Process 30:1949–1961

    Article  Google Scholar 

  35. He W, Pan C, Xu W, Zhang N (2021) Multi-attention embedded network for salient object detection. Soft Comput 25(20):13053–13067

    Article  Google Scholar 

  36. Xiao W, Liu H, Ma Z, Chen W (2022) Attention-based deep neural network for driver behavior recognition. Futur Gener Comput Syst 132:152–161

    Article  Google Scholar 

  37. Ellis CT, Skalaban LJ, Yates TS, Turk-Browne NB (2021) Attention recruits frontal cortex in human infants. Proceedings of the National Academy of Sciences 118(12):e2021474118. National Acad Sciences

    Article  Google Scholar 

  38. Woo S, Park J, Lee J, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the european conference on computer vision, pp 3–19

  39. Lu E, Hu X (2021) Image super-resolution via channel attention and spatial attention. Appl Intell:1–29

  40. Niu J, Xie Z, Li Y, Cheng S, Fan J (2021) Scale fusion light cnn for hyperspectral face recognition with knowledge distillation and attention mechanism. Appl Intell:1–15

  41. Gao S, Cheng M, Zhao K, Zhang X, Yang M, Torr P (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662

    Article  Google Scholar 

  42. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proc Syst, vol 30

  43. Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y et al (2022) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell. IEEE

  44. Guo M, Cai J, Liu Z, Mu T, Martin RR, Hu S (2021) Pct: point cloud transformer. Comput Vis Med 7(2):187–199

    Article  Google Scholar 

  45. Chen CR, Fan Q, Panda R (2021) Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 357–366

  46. Zhang H, Zu K, Lu J, Zou Y, Meng D (2021) EPSANet: An efficient pyramid squeeze attention block on convolutional neural network. arXiv:2105.14447

  47. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 13713–13722

  48. He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 558–567

  49. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Citeseer

  50. Vinyals O, Blundell C, Lillicrap T, Wierstra D et al (2016) Matching networks for one shot learning. Adv Neural Inform Process Syst, vol 29

  51. Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1365–1374

  52. Peng B, Jin X, Liu J, Li D, Wu Y, Liu Y, Zhou S, Zhang Z (2019) Correlation congruence for knowledge distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5007–5016

  53. Ahn S, Hu SX, Damianou A, Lawrence ND, Dai Z (2019) Variational information distillation for knowledge transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9163–9171

  54. Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3967–3976

  55. Passalis N, Tefas A (2018) Learning deep representations with probabilistic knowledge transfer. In: Proceedings of the european conference on computer vision, pp 268–284

  56. Kim J, Park S, Kwak N (2018) Paraphrasing complex network: Network compression via factor transfer. Adv Neural Inform Process Syst, vol 31

  57. Tian Y, Krishnan D, Isola P (2019) Contrastive representation distillation. In: Proceedings of the international conference on learning representations, pp 1–15

  58. Xu G, Liu Z, Li X, Loy CC (2020) Knowledge distillation meets self-supervision. In: Proceedings of the european conference on computer vision, Springer, pp 588–604

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61871278), the Fundamental Research Funds for the Central Universities (No. 2021SCU12061), the Natural Science Foundation of Sichuan (No. 2022NSFSC0922).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohai He.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Qing, L., He, X. et al. Image classification based on self-distillation. Appl Intell 53, 9396–9408 (2023). https://doi.org/10.1007/s10489-022-04008-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04008-y

Keywords

Navigation