Abstract
Convolutional neural networks have been widely used in various application scenarios. To extend the application to some areas where accuracy is critical, researchers have been investigating methods to improve accuracy using deeper or broader network structures, which creates exponential growth in computation and storage costs and delays in response time. In this paper, we propose a self-distillation image classification algorithm that significantly improves performance while decreasing training costs. In traditional self-distillation, the student model needs to improve its ability to acquire global information and focus on key features due to the lack of guidance from the teacher model. For this reason, we improved the traditional self-distillation algorithm by using a positional attention module and a residual block with attention. Experimental results show that the method achieves better performance compared with traditional knowledge distillation methods and attention networks.
Similar content being viewed by others
References
Wang X, Chen Z, Yun J (2012) An effective method for color image retrieval based on texture. Comput Stand Interfaces 34(1):31–35
Wang X, Wang Z (2014) The method for image retrieval based on multi-factors correlation utilizing block truncation coding. Pattern Recognit 47(10):3293–3303
Wang X, Wang Z (2013) A novel method for image retrieval based on structure elements’ descriptor. J Vis Commun Image Represent 24(1):63–74
Unar S, Wang X, Zhang C (2018) Visual and textual information fusion using kernel method for content based image retrieval. Inform Fusion 44:176–187
Unar S, Wang X, Wang C, Wang Y (2019) A decisive content based image retrieval approach for feature fusion in visual and textual images. Knowl-Based Syst 179:8–20
Wang C, Wang X, Xia Z, Ma B, Shi Y (2019) Image description with polar harmonic fourier moments. IEEE Trans Circuits Syst Video Technol 30(12):4440–4452
Wang C, Wang X, Xia Z, Zhang C (2019) Ternary radial harmonic fourier moments based robust stereo image zero-watermarking algorithm. Inf Sci 470:109–120
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Proc Mag 35(1):126– 136
Bashir D, Montanez GD, Sehra S, Segura PS, Lauw J (2020) An information-theoretic perspective on overfitting and underfitting. In: Australasian joint conference on artificial intelligence, Springer, pp 347–358
Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4133–4141
Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: a survey. Int J Comput Vis 129(6):1789–1819
Furlanello T, Lipton Z, Tschannen M, Itti L, Anandkumar A (2018) Born again neural networks. In: International conference on machine learning, PMLR, pp 1607–1616
Yuan L, Tay FE, Li G, Wang T, Feng J (2020) Revisiting knowledge distillation via label smoothing regularization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3903–3911
Ji M, Shin S, Hwang S, Park G, Moon I (2021) Refine myself by teaching myself: feature refinement via self-knowledge distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10664–10673
Zhang L, Song J, Gao A, Chen J, Bao C, Ma K (2019) Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3713–3722
Li T, Jin D, Du C, Cao X, Chen H, Yan J, Chen N, Chen Z, Feng Z, Liu S (2020) The image-based analysis and classification of urine sediments using a lenet-5 neural network. Comput Methods Biomech Biomed Eng Imaging Vis 8(1):109–114
Wang W, Liu Q, Wang W (2021) Pyramid-dilated deep convolutional neural network for crowd counting. Appl Intell:1–13
Zou W, Zhang D, Lee D (2022) A new multi-feature fusion based convolutional neural network for facial expression recognition. Appl Intell:1–12
Yu Q, Kavitha MS, Kurita T (2021) Mixture of experts with convolutional and variational autoencoders for anomaly detection. Appl Intell 51(6):3241–3254
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Buciluǎ C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 535–541
Hinton G, Vinyals O, Dean J et al (2015) Distilling the knowledge in a neural network. Proc Neural Inform Process Syst, vol 2 (7)
Adriana R, Nicolas B, Ebrahimi KS, Antoine C, Carlo G, Yoshua B (2015) Fitnets: hints for thin deep nets. In: Proceedings of the international conference on learning representations, pp 1–13
Komodakis N, Zagoruyko S (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Proceedings of the international conference on learning representations, pp 1–13
Salehi M, Sadjadi N, Baselizadeh S, Rohban MH, Rabiee HR (2021) Multiresolution knowledge distillation for anomaly detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 14902–14912
Dong N, Zhang Y, Ding M, Xu S, Bai Y (2022) One-stage object detection knowledge distillation via adversarial learning. Appl Intell 52(4):4582–4598
Oyedotun OK, Shabayek AER, Aouada D, Ottersten B (2021) Deep network compression with teacher latent subspace learning and lasso. Appl Intell 51(2):834–853
Yun S, Park J, Lee K, Shin J (2020) Regularizing class-wise predictions via self-knowledge distillation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13876–13885
Cai W, Liu B, Wei Z, Li M, Kan J (2021) Tardb-net: triple-attention guided residual dense and bilstm networks for hyperspectral image classification. Multimed Tools Appl 80(7):11291–11312
Zhang Z, Lin Z, Xu J, Jin W-D, Lu S-P, Fan D-P (2021) Bilateral attention network for rgb-d salient object detection. IEEE Trans Image Process 30:1949–1961
He W, Pan C, Xu W, Zhang N (2021) Multi-attention embedded network for salient object detection. Soft Comput 25(20):13053–13067
Xiao W, Liu H, Ma Z, Chen W (2022) Attention-based deep neural network for driver behavior recognition. Futur Gener Comput Syst 132:152–161
Ellis CT, Skalaban LJ, Yates TS, Turk-Browne NB (2021) Attention recruits frontal cortex in human infants. Proceedings of the National Academy of Sciences 118(12):e2021474118. National Acad Sciences
Woo S, Park J, Lee J, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the european conference on computer vision, pp 3–19
Lu E, Hu X (2021) Image super-resolution via channel attention and spatial attention. Appl Intell:1–29
Niu J, Xie Z, Li Y, Cheng S, Fan J (2021) Scale fusion light cnn for hyperspectral face recognition with knowledge distillation and attention mechanism. Appl Intell:1–15
Gao S, Cheng M, Zhao K, Zhang X, Yang M, Torr P (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proc Syst, vol 30
Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y et al (2022) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell. IEEE
Guo M, Cai J, Liu Z, Mu T, Martin RR, Hu S (2021) Pct: point cloud transformer. Comput Vis Med 7(2):187–199
Chen CR, Fan Q, Panda R (2021) Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 357–366
Zhang H, Zu K, Lu J, Zou Y, Meng D (2021) EPSANet: An efficient pyramid squeeze attention block on convolutional neural network. arXiv:2105.14447
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 13713–13722
He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 558–567
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Citeseer
Vinyals O, Blundell C, Lillicrap T, Wierstra D et al (2016) Matching networks for one shot learning. Adv Neural Inform Process Syst, vol 29
Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1365–1374
Peng B, Jin X, Liu J, Li D, Wu Y, Liu Y, Zhou S, Zhang Z (2019) Correlation congruence for knowledge distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5007–5016
Ahn S, Hu SX, Damianou A, Lawrence ND, Dai Z (2019) Variational information distillation for knowledge transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9163–9171
Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3967–3976
Passalis N, Tefas A (2018) Learning deep representations with probabilistic knowledge transfer. In: Proceedings of the european conference on computer vision, pp 268–284
Kim J, Park S, Kwak N (2018) Paraphrasing complex network: Network compression via factor transfer. Adv Neural Inform Process Syst, vol 31
Tian Y, Krishnan D, Isola P (2019) Contrastive representation distillation. In: Proceedings of the international conference on learning representations, pp 1–15
Xu G, Liu Z, Li X, Loy CC (2020) Knowledge distillation meets self-supervision. In: Proceedings of the european conference on computer vision, Springer, pp 588–604
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 61871278), the Fundamental Research Funds for the Central Universities (No. 2021SCU12061), the Natural Science Foundation of Sichuan (No. 2022NSFSC0922).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Qing, L., He, X. et al. Image classification based on self-distillation. Appl Intell 53, 9396–9408 (2023). https://doi.org/10.1007/s10489-022-04008-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04008-y