Efficient and robust deep learning with Correntropy-induced loss function

Chen, Liangjun; Qu, Hua; Zhao, Jihong; Chen, Badong; Principe, Jose C.

doi:10.1007/s00521-015-1916-x

Efficient and robust deep learning with Correntropy-induced loss function

Original Article
Published: 25 April 2015

Volume 27, pages 1019–1031, (2016)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Liangjun Chen¹,
Hua Qu¹,
Jihong Zhao¹,
Badong Chen¹ &
…
Jose C. Principe^1,2

2457 Accesses
3 Altmetric
Explore all metrics

Abstract

Deep learning systems aim at using hierarchical models to learning high-level features from low-level features. The progress in deep learning is great in recent years. The robustness of the learning systems with deep architectures is however rarely studied and needs further investigation. In particular, the mean square error (MSE), a commonly used optimization cost function in deep learning, is rather sensitive to outliers (or impulsive noises). Robust methods are needed to improve the learning performance and immunize the harmful influences caused by outliers which are pervasive in real-world data. In this paper, we propose an efficient and robust deep learning model based on stacked auto-encoders and Correntropy-induced loss function (CLF), called CLF-based stacked auto-encoders (CSAE). CLF as a nonlinear measure of similarity is robust to outliers and can approximate different norms (from $l_0$ to $l_2$) of data. Essentially, CLF is an MSE in reproducing kernel Hilbert space. Different from conventional stacked auto-encoders, which use, in general, the MSE as the reconstruction loss and KL divergence as the sparsity penalty term, the reconstruction loss and sparsity penalty term in CSAE are both built with CLF. The fine-tuning procedure in CSAE is also based on CLF, which can further enhance the learning performance. The excellent and robust performance of the proposed model is confirmed by simulation experiments on MNIST benchmark dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

K-Means Clustering Optimizing Deep Stacked Sparse Autoencoder

Article 10 February 2019

An iterative stacked weighted auto-encoder

Article 13 February 2021

Stacked Robust Autoencoder for Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Hinton G, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Bengio Y et al (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, vol 19 (NIPS06). MIT Press, pp 153–160
Poultney C, Chopra, S, Cun YL (2006) Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems, pp 1137–1144
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Freund Y, Haussler D (1992) Unsupervised learning of distributions on binary vectors using two layer networks. In: Advances in neural information processing systems 4. Morgan Kaufmann, San Mateo, CA, pp 912–919
Mobahi H, Collobert R, Weston J (2009) Deep learning from temporal coherence in video. In: Proceedings of the 26th annual international conference on machine learning. ACM
Weston J et al (2012) Deep learning via semi-supervised embedding. In: Neural networks: tricks of the trade. Springer, Berlin, pp 639–655
Yu W et al (2015) Learning deep representations via extreme learning machines. Neurocomputing 149:308–315
Article Google Scholar
Pandey G, Dukkipati A (2014) To go deep or wide in learning? In: Proceedings of the 17th international conference on artificial intelligence and statistics (AISTATS), vol 33. Reykjavik, Iceland
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MathSciNet MATH Google Scholar
Larochelle H et al (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM
Boureau Y, Cun YL (2008) Sparse feature learning for deep belief networks. In: Advances in neural information processing systems, pp 1185–1192
Vincent P et al (2008) Extracting and composing robust features with denoising auto-encoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103
Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: International conference on artificial intelligence and statistics
Ahmed A et al (2008) Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. Computer Vision-ECCV 2008. Springer, Berlin, pp 69–82
Ribeiro B, Lopes N (2013) Extreme learning classifier with deep concepts. In: Progress in pattern recognition, image analysis, computer vision, and applications. Springer, Berlin, pp 182–189
Pascal V et al (2010) Stacked denoising auto-encoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. Adv Neural Inf Process Syst 25:350–358
Google Scholar
Martnez AM (2002) Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans Pattern Anal Mach Intell 24(6):748–763
Article Google Scholar
Fidler S, Skocaj D, Leonardis A (2006) Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Trans Pattern Anal Mach Intell 28(3):337–350
Article Google Scholar
Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: Neural networks, 2006. IJCNN 06. International joint conference on, 2006, pp 4919–4924
Principe JC, Fisher JW III, Xu D (2000) Information theoretic learning. In: Haykin S (ed) Unsupervised adaptive filtering. Wiley, New York, NY
Google Scholar
Gunduz A, Principe JC (2009) Correntropy as a novel measure for nonlinearity tests. Signal Process 89(1):14–23
Article MATH Google Scholar
Liu W, Pokharel PP, Principe JC (2007) Correntropy: properties and applications in non-Gaussian signal processing. Signal Process IEEE Trans 55(11):5286–5298
Article MathSciNet Google Scholar
He R et al (2011) A regularized correntropy framework for robust pattern recognition. Neural Comput 23(8):2074–2100
Article MATH Google Scholar
Zhao S, Chen B, Principe JC (2011) Kernel adaptive filtering with maximum correntropy criterion. In: Proceedings of the international joint conference neural networks (IJCNN), pp 2012–2017
Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Proceedings of the international joint conference neural networks (IJCNN), pp 1–6
Chen B, Xing L, Liang J, Zheng N, Principe JC (2014) Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion. IEEE Signal Process Lett 21(8):880–884
Google Scholar
Chen B, Principe JC (2012) Maximum correntropy estimation is a smoothed MAP estimation. IEEE Signal Process Lett 19:491–494
Article Google Scholar
Seth S, Principe JC (2008) Compressed signal reconstruction using the correntropy induced metric. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE
Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Neural networks (IJCNN), the 2010 international joint conference on. IEEE
Singh A, Pokharel R, Principe J (2014) The C-loss function for pattern classification. Pattern Recognit 47(1):441–453
Article MATH Google Scholar
Qi Y, Wang Y, Zheng X et al (2014) Robust feature learning by stacked auto-encoder with maximum correntropy criterion. In: Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Liu W, Principe JC, Haykin S (2010) Kernel adaptive filtering. Wiley, New York
Book Google Scholar

Download references

Acknowledgments

This work was supported by 973 Program (No. 2015CB351703), 863 Project (No. 2014AA01A701) and National Natural Science Foundation of China (Nos. 61372152, 61371087).

Author information

Authors and Affiliations

The School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710049, People’s Republic of China
Liangjun Chen, Hua Qu, Jihong Zhao, Badong Chen & Jose C. Principe
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, 32611, USA
Jose C. Principe

Authors

Liangjun Chen
View author publications
You can also search for this author inPubMed Google Scholar
Hua Qu
View author publications
You can also search for this author inPubMed Google Scholar
Jihong Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Badong Chen
View author publications
You can also search for this author inPubMed Google Scholar
Jose C. Principe
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Badong Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Qu, H., Zhao, J. et al. Efficient and robust deep learning with Correntropy-induced loss function. Neural Comput & Applic 27, 1019–1031 (2016). https://doi.org/10.1007/s00521-015-1916-x

Download citation

Received: 06 November 2014
Accepted: 09 April 2015
Published: 25 April 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s00521-015-1916-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient and robust deep learning with Correntropy-induced loss function

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

K-Means Clustering Optimizing Deep Stacked Sparse Autoencoder

An iterative stacked weighted auto-encoder

Stacked Robust Autoencoder for Classification

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now