Abstract
Classical linear neural network architectures, such as the optimal linear associative memory (OLAM) Kohonen and Ruohonen (IEEE Trans Comp 22(7):701–702, 1973) and the adaptive linear element (Adaline) Widrow (IEEE Signal Process Mag 22(1):100–106, 2005; Widrow and Winter (IEEE Comp 21(3):25–39, 1988), are commonly used either as a standalone pattern classifier for linearly separable problems or as a fundamental building block of multilayer nonlinear classifiers, such as the multilayer perceptron (MLP), the radial basis functions networks (RBFN), the extreme learning machine (ELM) (Int J Mach Learn Cyber 2:107–122, 2011) and the echo-state network (ESN) Emmerich (Proceedings of the 20th international conference on artificial neural networks, 148–153, 2010). A common feature shared by the learning equations of OLAM and Adaline, respectively, the ordinary least squares (OLS) and the least mean squares (LMS) algorithms, is that they are optimal only under the assumption of gaussianity of the errors. However, the presence of outliers in the data causes the error distribution to depart from gaussianity and hence the classifier performance deteriorates. Bearing this in mind, in this paper we develop simple and efficient extensions of OLAM and Adaline, named Robust OLAM (ROLAM) and Robust Adaline (Radaline), which are robust to labeling errors (a.k.a. label noise), a type of outlier that often occur in classification tasks. This type of outlier usually results from mistakes during labelling the data points (e.g. misjudgement of a specialist) or from typing errors during creation of data files (e.g. by striking an incorrect key on a keyboard). To deal with such outliers, the ROLAM and the Radaline use \(M\)-estimators to compute the weights of the OLAM and Adaline networks, instead of using standard OLS/LMS algorithms. By means of comprehensive computer simulations using synthetic and real-world data sets, we show that the proposed robust linear classifiers consistently outperforms their original versions.
Similar content being viewed by others
Notes
Also known as delta learning rule or the Widrow–Hoff learning rule [34].
First component of \(\mathbf {x}_{n}\) is equal to 1 in order to include the bias.
In other words, at iteration \(n\) or, equivalently, at the presentation of the \(n\)-th input pattern.
The \(H_{\infty }\) criterion has been introduced, initially in the control theory literature, as a means to ensure robust performance in the face of model uncertainties and lack of statistical information on the exogenous signals.
www.mathworks.com (Matlab) and www.R-project.org (R).
Spondylolisthesis is the displacement of a vertebra or the vertebral column in relation to the vertebrae below.
References
Akusok A, Veganzones D, Miche Y, Severin E, Lendasse A (2014) Finding originally mislabels with md-elm. In: Proceedings of the 22th european symposium on artificial neural networks, computational intelligence and machine learning (ESANN’2014), pp 689–694
Alpaydin E, Jordan MI (1996) Local linear perceptrons for classification. IEEE Trans Neural Netw 7(3):788–792
Anderson J (1972) A simple neural network generating an interactive memory. Math Biosci 14(3–4):197–220
Ayad O (2014) Learning under concept drift with SVM. In: Proceedings of the 24th international conference on artificial neural networks (ICANN’2014), vol LNCS 8681, pp 587–594
Bolzern P, Colaneri P, De Nicolao G (1999) H\(_\infty \)-robustness of adaptive filters against measurement noise and parameter drift. Automatica 35(9):1509–1520
Chan SC, Zhou Y (2010) On the performance analysis of the least mean \({M}\)-estimate and normalized least mean \({M}\)-estimate algorithms with gaussian inputs and additive gaussian and contaminated gaussian noises. J Signal Process Syst 80(1):81–103
Chatterjee S, Hadi AS (1986) Influential observations, high leverage points, and outliers in linear regression. Stat Sci 1(3):379–393
Cherkassky V, Fassett K, Vassilas N (1991) Linear algebra approach to neural associative memories and noise performance of neural classifiers. IEEE Trans Comput 40(12):1429–1435
Dasgupta S, Kalai AT, Monteleoni C (2009) Analysis of perceptron-based active learning. J Mach Learn Res 10:281–299
Duda RO, Hart PE, Stork DG (2006) Pattern classification, 2nd edn. Wiley, New York
Eichmann G, Kasparis T (1989) Pattern classification using a linear associative memory. Pattern Recogn 22(6):733–740
Emmerich C, Reinhart F, Steil J (2010) Recurrence enhances the spatial encoding of static inputs in reservoir networks. In: Proceedings of the 20th international conference on artificial neural networks, vol LNCS 6353, Springer, pp 148–153
Fox J (2002) An R and S-PLUS companion to applied regression. Sage Publications, Thousand Oaks
Frank A, Asuncion A (2010) UCI machine learning repository. URL http://archive.ics.uci.edu/ml
Frenay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296
Frieß T-T, Harrison RF (1999) A kernel-based Adaline for function approximation. Intell Data Anal 3(4):307–313
Golub GH, van Loan CF (1996) Matrix Comput, 3rd edn. Johns Hopkins University Press, Baltimore
Hassibi B, Sayed AH, Kailath T (1994) H\(_\infty \) optimality criteria for LMS and backpropagation. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems 6. morgan-kaufmann, San Mateo, pp 351–358
Hassibi B, Sayed AH, Kailath T (1996) H\(_\infty \) optimality of the LMS algorithm algorithm. IEEE Trans Signal Process 44(2):267–280
Haykin S (2008) Neural networks and learning machines, 3rd edn. Prentice-Hall, New Jersey
Huang G-B, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2:107–122
Huber PJ (1964) Robust estimation of a location parameter. Annal Math Stat 35(1):73–101
Huber PJ, Ronchetti EM (2009) Robust Stat. Wiley, New York
Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430
Kavak A, Yigit H, Ertunc HM (2005) Using Adaline neural network for performance improvement of smart antennas in tdd wireless communications. IEEE Trans Neural Netw 16(6):1616–1625
Kim H-C, Ghahramani Z (2008) Outlier robust gaussian process classification. In: Proceedings of the 2008 joint IAPR international workshop on structural, syntactic, and statistical pattern recognition (SSPR)’08, pp 896–905
Kohonen T (1989) Self-organization and associative memory. Springer-Verlag, Berlin
Kohonen T, Ruohonen M (1973) Representation of associated data by matrix operators. IEEE Trans Comput 22(7):701–702
Liu W, Pokharel P, Principe J (2008) The kernel least-mean-square algorithm. IEEE Trans Signal Process 56(2):543–554
Nakano K (1972) Associatron: a model of associative memory. IEEE Trans Syst Man Cybern SMC–2(3):380–388
Oja E (1992) Principal components, minor components and linear neural networks. Neural Netw 5:927–935
Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78(9):1481–1497
Principe JC, Euliano NR, Lefebvre WC (2000) Neural and adaptive systems: fundamentals through simulations. Wiley, New York
Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York
Stevens JP (1984) Outliers and influential data points in regression analysis. Psychol Bull 95(2):334–344
Webb A (2002) Statistical pattern recognition, 2nd edn. Wiley, New York
Widrow B (2005) Thinking about thinking: the discovery of the LMS algorithm. IEEE Signal Process Mag 22(1):100–106
Widrow B, Kamenetsky M (2003) Statistical efficiency of adaptive algorithms. Neural Netw 16(5–6):735–744
Widrow B, Winter R (1988) Neural nets for adaptive filtering and adaptive pattern recognition. IEEE Comput 21(3):25–39
Williamson GA, Clarkson PM, Sethares WA (1993) Performance characteristics of the median LMS adaptive filter. IEEE Trans Signal Process 41(2):667–680
Wu Y, Liu Y (2007) Robust truncated hinge loss support vector machines. J Am Stat Assoc 102(479):974–983
Zhu X, Wu X (2004) Class noise versus attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210
Zou Y, Chan SC, Ng TS (2000) Least mean \(M\)-estimate algorithms for robust adaptive filtering in impulsive noise. IEEE Trans Circuits Syst II 47(12):1564–1569
Acknowledgments
The authors thank CNPq (Grant 309841/2012-7) for the financial support and NUTEC (Fundação Núcleo de Tecnologia Industrial do Ceará) for providing the laboratory infrastructure for the execution of the research activities reported in this paper. We also thank Mr. César Lincoln Mattos and José Daniel Santos for the kind help in generating the results for the KLMS classifier.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
By applying a nonlinear transformation in the input data, it is possible to obtain a nonlinear classifier from the same error function in Eq. (9). In a kernel context, the KLMS algorithm [30] operates on the feature space obtained by applying a mapping \(\Phi (\cdot )\) to the inputs, generating a new sequence of input-output pairs \(\{(\Phi (\varvec{x}_n), \mathbf {d}_n)\}_{n=1}^N\) [30]. Weight updating is similar to the LMS rule shown in Eq. (10):
Considering \(\hat{{\varvec{\beta }}}_{i,0} = \varvec{0}\), where \(\varvec{0}\) is the null-vector, after \(N\) iterations we get
where \(\kappa (\varvec{x}_n, \varvec{x}_N) = \Phi (\varvec{x}_n)^T\Phi (\varvec{x}_N)\) is a positive-definite kernel function. It should be noted that only Eq. (30) is needed both for training and testing. Although the values of the weight vector do not need to be computed, the a priori errors \(e_{in}, n \in \{1, \cdots N\}\), and the training inputs \(\varvec{x}_n, n \in \{1, \cdots N\}\), must be maintained for prediction purposes.
Rights and permissions
About this article
Cite this article
Barreto, G.A., Barros, A.L.B.P. On the Design of Robust Linear Pattern Classifiers Based on \(M\)-Estimators. Neural Process Lett 42, 119–137 (2015). https://doi.org/10.1007/s11063-014-9393-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-014-9393-2