Improving generalization in deep neural network using knowledge transformation based on fisher criterion

Morabbi, Sajedeh; Soltanizadeh, Hadi; Mozaffari, Saeed; Fadaeieslam, Mohammad Javad

doi:10.1007/s11227-023-05448-0

Improving generalization in deep neural network using knowledge transformation based on fisher criterion

Published: 17 June 2023

Volume 79, pages 20899–20922, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Sajedeh Morabbi¹,
Hadi Soltanizadeh¹,
Saeed Mozaffari¹ &
…
Mohammad Javad Fadaeieslam¹

166 Accesses
Explore all metrics

Abstract

Most deep neural networks (DNNs) are trained in an over-parametrized regime. In this case, the numbers of their parameters are more than available training data which reduces the generalization capability and performance on new and unseen samples. Generalization of DNNs has been improved by applying various methods such as regularization techniques, data enhancement, network capacity restriction, injection randomness, etc. In this paper, we proposed an effective generalization method, named multivariate statistical knowledge transformation, which learns feature distribution to separate samples based on the variance of deep hypothesis space in all dimensions. Moreover, the proposed method uses latent knowledge of the target to boost the confidence of its prediction. Compared to state-of-the-art methods, the transformation of multivariate statistical knowledge yields competitive results. Experimental results show that the proposed method achieved impressive generalization performance on CIFAR-10, CIFAR-100, and Tiny ImageNet with accuracy of 91.96%, 97.52%, and 99.21% respectively. Furthermore, this method enables faster convergence during the initial epochs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Attention for Domain Generalization

Training neural networks by marginalizing out hidden layer noise

Article 10 February 2017

Multi-source deep transfer learning algorithm based on feature alignment

Article 01 July 2023

Data availability

The datasets analyzed during the current study are included in this published article [22].

References

Hashemi AS, Mozaffari S, Alirezaee S (2022) Improving adversarial robustness of traffic sign image recognition networks. Displays. https://doi.org/10.1016/j.displa.2022.102277
Article Google Scholar
Sitaula C, Hossain MB (2021) Attention-based VGG-16 model for COVID-19 chest X-ray image classification. Appl Intell 51:2850–2863. https://doi.org/10.1007/s10489-020-02055-x
Article Google Scholar
Xi P, Guan H, Shu C et al (2020) An integrated approach for medical abnormality detection using deep patch convolutional neural networks. Vis Comput 36:1869–1882. https://doi.org/10.1007/s00371-019-01775-7
Article Google Scholar
Jin B, Cruz L, Gonçalves N (2020) Deep facial diagnosis: deep transfer learning from face recognition to facial diagnosis. IEEE Access 8:123649–123661
Article Google Scholar
Khosravanian A, Rahmanimanesh M, Keshavarzi P et al (2022) Level set method for automated 3D brain tumor segmentation using symmetry analysis and kernel induced fuzzy clustering. Multimed Tools Appl 81:21719–21740. https://doi.org/10.1007/s11042-022-12445-7
Article Google Scholar
Wu JL, Chung WY (2022) Sentiment-based masked language modeling for improving sentence-level valence–arousal prediction. Appl Intell 52:16353–16369. https://doi.org/10.1007/s10489-022-03384-9
Article Google Scholar
Willemink MJ, Koszek WA, Hardell C et al (2020) Preparing medical imaging data for machine learning. Radiology 295:4–15. https://doi.org/10.1148/radiol.2020192224
Article Google Scholar
Ghadhab L, Jenhani I, Mkaouer MW, Ben Messaoud M (2021) Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model. Inf Softw Technol. https://doi.org/10.1016/j.infsof.2021.106566
Article Google Scholar
Zheng Q, Zhao P, Li Y et al (2021) Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput Appl 33:7723–7745. https://doi.org/10.1007/s00521-020-05514-1
Article Google Scholar
Pang T, Xu K, Dong Y, et al (2019) Rethinking softmax cross-entropy loss for adversarial robustness
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press
MATH Google Scholar
Kawaguchi K, Bengio Y, Kaelbling L (2022) Generalization in deep learning. Math Asp Deep Learn. https://doi.org/10.1017/9781009025096.003
Article MATH Google Scholar
Gong C, Ren T, Ye M, Liu Q (2021) MaxUp: lightweight adversarial training with data augmentation improves neural network training. In: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2474–2483https://doi.org/10.1109/CVPR46437.2021.00250
Sarwar Murshed MG, Carroll JJ, Khan N, Hussain F (2022) Efficient deployment of deep learning models on autonomous robots in the ROS environment. Deep Learn Appl 3:215–243. https://doi.org/10.1007/978-981-16-3357-7_9
Article Google Scholar
Stanton S, Izmailov P, Kirichenko P et al (2021) Does knowledge distillation really work? Adv Neural Inf Process Syst 9:6906–6919
Google Scholar
Zhang C, Bengio S, Hardt M et al (2021) Understanding deep learning (still) requires rethinking generalization. Commun ACM 64:107–115. https://doi.org/10.1145/3446776
Article Google Scholar
Oymak S, Soltanolkotabi M (2019) Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: 36th Int Conf Mach Learn ICML 2019 2019-June:8707–8747
Gou J, Xiong X, Yu B et al (2023) Multi-target knowledge distillation via student self-reflection. Int J Comput Vis. https://doi.org/10.1007/s11263-023-01792-z
Article Google Scholar
Coulibaly S, Kamsu-Foguem B, Kamissoko D, Traore D (2019) Deep neural networks with transfer learning in millet crop images. Comput Ind 108:115–120. https://doi.org/10.1016/j.compind.2019.02.003
Article Google Scholar
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network
Zheng Q, Zhao P, Zhang D, Wang H (2021) MR-DCAE: manifold regularization-based deep convolutional autoencoder for unauthorized broadcasting identification. Int J Intell Syst 36:7204–7238. https://doi.org/10.1002/int.22586
Article Google Scholar
Zheng Q, Zhao P, Wang H et al (2022) Fine-grained modulation classification using multi-scale radio transformer with dual-channel representation. IEEE Commun Lett 26:1298–1302. https://doi.org/10.1109/LCOMM.2022.3145647
Article Google Scholar
Ba LJ, Caruana R (2014) Do deep nets really need to be deep? Adv Neural Inf Process Syst 3:2654–2662
Google Scholar
Zhang J (2017) Multivariate analysis and machine learning in cerebral palsy research. Front Neurol. https://doi.org/10.3389/fneur.2017.00715
Article Google Scholar
Krizhevsky A (2009) Learning multiple layers of features from tiny images. In: … Sci Dep Univ Toronto, Tech … 1–60
Le Y, Yang X (2015) Tiny imagenet visual recognition challenge. Stanford CS231N
Neyshabur B, Li Z, Bhojanapalli S et al (2018) Towards Understanding the role of over-parametrization in generalization of neural networks. Iclr 2019:1–20
Google Scholar
Zheng Q, Tian X, Yang M et al (2020) PAC-Bayesian framework based drop-path method for 2D discriminative convolutional network pruning. Multidimens Syst Signal Process 31:793–827. https://doi.org/10.1007/s11045-019-00686-z
Article MathSciNet MATH Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd Int Conf Learn Represent ICLR 2015—Conf Track Proc 14
Ahn S, Hu SX, Damianou A, et al (2019) Variational information distillation for knowledge transfer. In: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2019-June: 9155–9163. https://doi.org/10.1109/CVPR.2019.00938
Guo Q, Wang X, Wu Y, et al (2020) Online knowledge distillation via collaborative learning. In: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. p 11017–11026. https://doi.org/10.1109/CVPR42600.2020.01103
Chen D, Mei JP, Wang C, et al (2020) Online knowledge distillation with diverse peers. In: AAAI 2020—34th AAAI Conference on Artificial Intelligence. p 3430–3437. https://doi.org/10.1609/aaai.v34i04.5746
Wen T, Lai S, Qian X (2021) Preparing lessons: Improve knowledge distillation with better supervision. Neurocomputing 454:25–33. https://doi.org/10.1016/j.neucom.2021.04.102
Article Google Scholar
Yun S, Park J, Lee K, Shin J (2020) Regularizing class-wise predictions via self-knowledge distillation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 13873–13882. DOI: https://doi.org/10.1109/CVPR42600.2020.01389
Zhou S, Tian S, Yu L et al (2023) FixMatch-LS: semi-supervised skin lesion classification with label smoothing. Biomed Signal Process Control 84:104709. https://doi.org/10.1016/j.bspc.2023.104709
Article MathSciNet Google Scholar
Cao Y, Wan Q, Shen W, Gao L (2022) Informative knowledge distillation for image anomaly segmentation. Knowledge-Based Syst. https://doi.org/10.1016/j.knosys.2022.108846
Article Google Scholar
Suh S, Rey VFLP (2023) Transformer-based adversarial learning for human activity recognition using wearable sensors via self-knowledge distillation. Knowledge-Based Syst. https://doi.org/10.1016/j.knosys.2022.110143
Article Google Scholar
Zheng Z PX (2022) Self-guidance: improve deep neural network generalization via knowledge distillation. In: Proc IEEE/CVF Winter Conf Appl Comput Vis. p 3203–3212
Moutik O, Tigani S, Saadane RCA (2021) Hybrid deep learning vision-based models for human object interaction detection by knowledge distillation. Proc Comput Sci 192:5093–5103
Article Google Scholar
Wu W, Zhou K, Chen XD, Yong JH (2022) Light-weight shadow detection via GCN-based annotation strategy and knowledge distillation. Comput Vis Image Underst. https://doi.org/10.1016/j.cviu.2021.103341
Article Google Scholar
Zhu X, Gong S, others (2018) Knowledge distillation by on-the-fly native ensemble. Adv Neural Inf Process Syst. p 7517–7527
Qing H, Tang J, Yang X, Huang X, Zhu HJN (2022) Stimulates potential for knowledge distillation. In: Artificial Neural Networks and Machine Learning. Artif Neural Networks Mach Learn 31st Int Conf Artif Neural Networks. p 187–198
Borza DL, Ileni TA, Marinescu AIDS (2023) Teacher or supervisor? Effective online knowledge distillation via guided collaborative learning. Comput Vis Image Underst 18:103632
Article Google Scholar
Zhang S, Chen C, Hu XPS (2023) Balanced knowledge distillation for long-tailed learning. Neurocomputing. https://doi.org/10.1016/j.neucom.2023.01.063
Article Google Scholar
Welling M (2007) Fisher linear discriminant analysis max. In: 2007 9th Int Symp Signal Process its Appl ISSPA 2007, Proc
Koutsoukas A, Monaghan KJ, Li X, Huan J (2017) Deep-learning: Investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform. https://doi.org/10.1186/s13321-017-0226-y
Article Google Scholar
Dorfer M, Kelz R, Widmer G (2016) Deep linear discriminant analysis. In: 4th Int Conf Learn Represent ICLR 2016—Conf Track Proc
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016-Decem: 770–778
Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE conference on computer vision and pattern recognition, p 4320–4328
Kim J, Park SU, Kwak N (2018) Paraphrasing complex network: Network compression via factor transfer. Adv Neural Inf Process Syst. P 2760–2769

Download references

Funding

The authors declare that no funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Faculty of Electrical and Computer Engineering, Semnan University, Semnan, Iran
Sajedeh Morabbi, Hadi Soltanizadeh, Saeed Mozaffari & Mohammad Javad Fadaeieslam

Authors

Sajedeh Morabbi
View author publications
You can also search for this author in PubMed Google Scholar
Hadi Soltanizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Mozaffari
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Javad Fadaeieslam
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the design and implementation of the research, the analysis of the results, and the writing of the manuscript.

Corresponding author

Correspondence to Hadi Soltanizadeh.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Morabbi, S., Soltanizadeh, H., Mozaffari, S. et al. Improving generalization in deep neural network using knowledge transformation based on fisher criterion. J Supercomput 79, 20899–20922 (2023). https://doi.org/10.1007/s11227-023-05448-0

Download citation

Accepted: 29 May 2023
Published: 17 June 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11227-023-05448-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving generalization in deep neural network using knowledge transformation based on fisher criterion

Abstract

Access this article

Similar content being viewed by others

Efficient Attention for Domain Generalization

Training neural networks by marginalizing out hidden layer noise

Multi-source deep transfer learning algorithm based on feature alignment

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving generalization in deep neural network using knowledge transformation based on fisher criterion

Abstract

Access this article

Similar content being viewed by others

Efficient Attention for Domain Generalization

Training neural networks by marginalizing out hidden layer noise

Multi-source deep transfer learning algorithm based on feature alignment

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation