Abstract
Deep learning systems aim at using hierarchical models to learning high-level features from low-level features. The progress in deep learning is great in recent years. The robustness of the learning systems with deep architectures is however rarely studied and needs further investigation. In particular, the mean square error (MSE), a commonly used optimization cost function in deep learning, is rather sensitive to outliers (or impulsive noises). Robust methods are needed to improve the learning performance and immunize the harmful influences caused by outliers which are pervasive in real-world data. In this paper, we propose an efficient and robust deep learning model based on stacked auto-encoders and Correntropy-induced loss function (CLF), called CLF-based stacked auto-encoders (CSAE). CLF as a nonlinear measure of similarity is robust to outliers and can approximate different norms (from \(l_0\) to \(l_2\)) of data. Essentially, CLF is an MSE in reproducing kernel Hilbert space. Different from conventional stacked auto-encoders, which use, in general, the MSE as the reconstruction loss and KL divergence as the sparsity penalty term, the reconstruction loss and sparsity penalty term in CSAE are both built with CLF. The fine-tuning procedure in CSAE is also based on CLF, which can further enhance the learning performance. The excellent and robust performance of the proposed model is confirmed by simulation experiments on MNIST benchmark dataset.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hinton G, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Bengio Y et al (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, vol 19 (NIPS06). MIT Press, pp 153–160
Poultney C, Chopra, S, Cun YL (2006) Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems, pp 1137–1144
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Freund Y, Haussler D (1992) Unsupervised learning of distributions on binary vectors using two layer networks. In: Advances in neural information processing systems 4. Morgan Kaufmann, San Mateo, CA, pp 912–919
Mobahi H, Collobert R, Weston J (2009) Deep learning from temporal coherence in video. In: Proceedings of the 26th annual international conference on machine learning. ACM
Weston J et al (2012) Deep learning via semi-supervised embedding. In: Neural networks: tricks of the trade. Springer, Berlin, pp 639–655
Yu W et al (2015) Learning deep representations via extreme learning machines. Neurocomputing 149:308–315
Pandey G, Dukkipati A (2014) To go deep or wide in learning? In: Proceedings of the 17th international conference on artificial intelligence and statistics (AISTATS), vol 33. Reykjavik, Iceland
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Larochelle H et al (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM
Boureau Y, Cun YL (2008) Sparse feature learning for deep belief networks. In: Advances in neural information processing systems, pp 1185–1192
Vincent P et al (2008) Extracting and composing robust features with denoising auto-encoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103
Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: International conference on artificial intelligence and statistics
Ahmed A et al (2008) Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. Computer Vision-ECCV 2008. Springer, Berlin, pp 69–82
Ribeiro B, Lopes N (2013) Extreme learning classifier with deep concepts. In: Progress in pattern recognition, image analysis, computer vision, and applications. Springer, Berlin, pp 182–189
Pascal V et al (2010) Stacked denoising auto-encoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. Adv Neural Inf Process Syst 25:350–358
Martnez AM (2002) Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans Pattern Anal Mach Intell 24(6):748–763
Fidler S, Skocaj D, Leonardis A (2006) Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Trans Pattern Anal Mach Intell 28(3):337–350
Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: Neural networks, 2006. IJCNN 06. International joint conference on, 2006, pp 4919–4924
Principe JC, Fisher JW III, Xu D (2000) Information theoretic learning. In: Haykin S (ed) Unsupervised adaptive filtering. Wiley, New York, NY
Gunduz A, Principe JC (2009) Correntropy as a novel measure for nonlinearity tests. Signal Process 89(1):14–23
Liu W, Pokharel PP, Principe JC (2007) Correntropy: properties and applications in non-Gaussian signal processing. Signal Process IEEE Trans 55(11):5286–5298
He R et al (2011) A regularized correntropy framework for robust pattern recognition. Neural Comput 23(8):2074–2100
Zhao S, Chen B, Principe JC (2011) Kernel adaptive filtering with maximum correntropy criterion. In: Proceedings of the international joint conference neural networks (IJCNN), pp 2012–2017
Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Proceedings of the international joint conference neural networks (IJCNN), pp 1–6
Chen B, Xing L, Liang J, Zheng N, Principe JC (2014) Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion. IEEE Signal Process Lett 21(8):880–884
Chen B, Principe JC (2012) Maximum correntropy estimation is a smoothed MAP estimation. IEEE Signal Process Lett 19:491–494
Seth S, Principe JC (2008) Compressed signal reconstruction using the correntropy induced metric. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE
Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Neural networks (IJCNN), the 2010 international joint conference on. IEEE
Singh A, Pokharel R, Principe J (2014) The C-loss function for pattern classification. Pattern Recognit 47(1):441–453
Qi Y, Wang Y, Zheng X et al (2014) Robust feature learning by stacked auto-encoder with maximum correntropy criterion. In: Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Liu W, Principe JC, Haykin S (2010) Kernel adaptive filtering. Wiley, New York
Acknowledgments
This work was supported by 973 Program (No. 2015CB351703), 863 Project (No. 2014AA01A701) and National Natural Science Foundation of China (Nos. 61372152, 61371087).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, L., Qu, H., Zhao, J. et al. Efficient and robust deep learning with Correntropy-induced loss function. Neural Comput & Applic 27, 1019–1031 (2016). https://doi.org/10.1007/s00521-015-1916-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-015-1916-x