On the Design of Robust Linear Pattern Classifiers Based on $$M$$ -Estimators

Barreto, Guilherme A.; Barros, Ana Luiza B. P.

doi:10.1007/s11063-014-9393-2

On the Design of Robust Linear Pattern Classifiers Based on $M$-Estimators

Published: 25 October 2014

Volume 42, pages 119–137, (2015)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Guilherme A. Barreto¹ &
Ana Luiza B. P. Barros²

211 Accesses
3 Citations
Explore all metrics

Abstract

Classical linear neural network architectures, such as the optimal linear associative memory (OLAM) Kohonen and Ruohonen (IEEE Trans Comp 22(7):701–702, 1973) and the adaptive linear element (Adaline) Widrow (IEEE Signal Process Mag 22(1):100–106, 2005; Widrow and Winter (IEEE Comp 21(3):25–39, 1988), are commonly used either as a standalone pattern classifier for linearly separable problems or as a fundamental building block of multilayer nonlinear classifiers, such as the multilayer perceptron (MLP), the radial basis functions networks (RBFN), the extreme learning machine (ELM) (Int J Mach Learn Cyber 2:107–122, 2011) and the echo-state network (ESN) Emmerich (Proceedings of the 20th international conference on artificial neural networks, 148–153, 2010). A common feature shared by the learning equations of OLAM and Adaline, respectively, the ordinary least squares (OLS) and the least mean squares (LMS) algorithms, is that they are optimal only under the assumption of gaussianity of the errors. However, the presence of outliers in the data causes the error distribution to depart from gaussianity and hence the classifier performance deteriorates. Bearing this in mind, in this paper we develop simple and efficient extensions of OLAM and Adaline, named Robust OLAM (ROLAM) and Robust Adaline (Radaline), which are robust to labeling errors (a.k.a. label noise), a type of outlier that often occur in classification tasks. This type of outlier usually results from mistakes during labelling the data points (e.g. misjudgement of a specialist) or from typing errors during creation of data files (e.g. by striking an incorrect key on a keyboard). To deal with such outliers, the ROLAM and the Radaline use $M$-estimators to compute the weights of the OLAM and Adaline networks, instead of using standard OLS/LMS algorithms. By means of comprehensive computer simulations using synthetic and real-world data sets, we show that the proposed robust linear classifiers consistently outperforms their original versions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the Classification Performance of Optimal Linear Associative Memory in the Presence of Outliers

Improved Adaline Networks for Robust Pattern Classification

A new intelligent pattern classifier based on deep-thinking

Article 13 September 2019

Notes

Also known as delta learning rule or the Widrow–Hoff learning rule [34].
First component of $\mathbf {x}_{n}$ is equal to 1 in order to include the bias.
In other words, at iteration $n$ or, equivalently, at the presentation of the $n$-th input pattern.
The $H_{\infty }$ criterion has been introduced, initially in the control theory literature, as a means to ensure robust performance in the face of model uncertainties and lack of statistical information on the exogenous signals.
www.mathworks.com (Matlab) and www.R-project.org (R).
www.4shared.com/zip/HCELAcCLce/Robust_linear_classifiers.html.
Spondylolisthesis is the displacement of a vertebra or the vertebral column in relation to the vertebrae below.

References

Akusok A, Veganzones D, Miche Y, Severin E, Lendasse A (2014) Finding originally mislabels with md-elm. In: Proceedings of the 22th european symposium on artificial neural networks, computational intelligence and machine learning (ESANN’2014), pp 689–694
Alpaydin E, Jordan MI (1996) Local linear perceptrons for classification. IEEE Trans Neural Netw 7(3):788–792
Article Google Scholar
Anderson J (1972) A simple neural network generating an interactive memory. Math Biosci 14(3–4):197–220
Article Google Scholar
Ayad O (2014) Learning under concept drift with SVM. In: Proceedings of the 24th international conference on artificial neural networks (ICANN’2014), vol LNCS 8681, pp 587–594
Bolzern P, Colaneri P, De Nicolao G (1999) H$_\infty $-robustness of adaptive filters against measurement noise and parameter drift. Automatica 35(9):1509–1520
Article Google Scholar
Chan SC, Zhou Y (2010) On the performance analysis of the least mean ${M}$-estimate and normalized least mean ${M}$-estimate algorithms with gaussian inputs and additive gaussian and contaminated gaussian noises. J Signal Process Syst 80(1):81–103
Article Google Scholar
Chatterjee S, Hadi AS (1986) Influential observations, high leverage points, and outliers in linear regression. Stat Sci 1(3):379–393
Article Google Scholar
Cherkassky V, Fassett K, Vassilas N (1991) Linear algebra approach to neural associative memories and noise performance of neural classifiers. IEEE Trans Comput 40(12):1429–1435
Article Google Scholar
Dasgupta S, Kalai AT, Monteleoni C (2009) Analysis of perceptron-based active learning. J Mach Learn Res 10:281–299
Google Scholar
Duda RO, Hart PE, Stork DG (2006) Pattern classification, 2nd edn. Wiley, New York
Google Scholar
Eichmann G, Kasparis T (1989) Pattern classification using a linear associative memory. Pattern Recogn 22(6):733–740
Article Google Scholar
Emmerich C, Reinhart F, Steil J (2010) Recurrence enhances the spatial encoding of static inputs in reservoir networks. In: Proceedings of the 20th international conference on artificial neural networks, vol LNCS 6353, Springer, pp 148–153
Fox J (2002) An R and S-PLUS companion to applied regression. Sage Publications, Thousand Oaks
Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. URL http://archive.ics.uci.edu/ml
Frenay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Article Google Scholar
Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296
Article Google Scholar
Frieß T-T, Harrison RF (1999) A kernel-based Adaline for function approximation. Intell Data Anal 3(4):307–313
Article Google Scholar
Golub GH, van Loan CF (1996) Matrix Comput, 3rd edn. Johns Hopkins University Press, Baltimore
Google Scholar
Hassibi B, Sayed AH, Kailath T (1994) H$_\infty $ optimality criteria for LMS and backpropagation. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems 6. morgan-kaufmann, San Mateo, pp 351–358
Google Scholar
Hassibi B, Sayed AH, Kailath T (1996) H$_\infty $ optimality of the LMS algorithm algorithm. IEEE Trans Signal Process 44(2):267–280
Article Google Scholar
Haykin S (2008) Neural networks and learning machines, 3rd edn. Prentice-Hall, New Jersey
Google Scholar
Huang G-B, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2:107–122
Article Google Scholar
Huber PJ (1964) Robust estimation of a location parameter. Annal Math Stat 35(1):73–101
Article Google Scholar
Huber PJ, Ronchetti EM (2009) Robust Stat. Wiley, New York
Book Google Scholar
Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430
Article Google Scholar
Kavak A, Yigit H, Ertunc HM (2005) Using Adaline neural network for performance improvement of smart antennas in tdd wireless communications. IEEE Trans Neural Netw 16(6):1616–1625
Article Google Scholar
Kim H-C, Ghahramani Z (2008) Outlier robust gaussian process classification. In: Proceedings of the 2008 joint IAPR international workshop on structural, syntactic, and statistical pattern recognition (SSPR)’08, pp 896–905
Kohonen T (1989) Self-organization and associative memory. Springer-Verlag, Berlin
Book Google Scholar
Kohonen T, Ruohonen M (1973) Representation of associated data by matrix operators. IEEE Trans Comput 22(7):701–702
Article Google Scholar
Liu W, Pokharel P, Principe J (2008) The kernel least-mean-square algorithm. IEEE Trans Signal Process 56(2):543–554
Article Google Scholar
Nakano K (1972) Associatron: a model of associative memory. IEEE Trans Syst Man Cybern SMC–2(3):380–388
Article Google Scholar
Oja E (1992) Principal components, minor components and linear neural networks. Neural Netw 5:927–935
Article Google Scholar
Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78(9):1481–1497
Article Google Scholar
Principe JC, Euliano NR, Lefebvre WC (2000) Neural and adaptive systems: fundamentals through simulations. Wiley, New York
Google Scholar
Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York
Book Google Scholar
Stevens JP (1984) Outliers and influential data points in regression analysis. Psychol Bull 95(2):334–344
Article Google Scholar
Webb A (2002) Statistical pattern recognition, 2nd edn. Wiley, New York
Book Google Scholar
Widrow B (2005) Thinking about thinking: the discovery of the LMS algorithm. IEEE Signal Process Mag 22(1):100–106
Article Google Scholar
Widrow B, Kamenetsky M (2003) Statistical efficiency of adaptive algorithms. Neural Netw 16(5–6):735–744
Article Google Scholar
Widrow B, Winter R (1988) Neural nets for adaptive filtering and adaptive pattern recognition. IEEE Comput 21(3):25–39
Article Google Scholar
Williamson GA, Clarkson PM, Sethares WA (1993) Performance characteristics of the median LMS adaptive filter. IEEE Trans Signal Process 41(2):667–680
Article Google Scholar
Wu Y, Liu Y (2007) Robust truncated hinge loss support vector machines. J Am Stat Assoc 102(479):974–983
Zhu X, Wu X (2004) Class noise versus attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210
Article Google Scholar
Zou Y, Chan SC, Ng TS (2000) Least mean $M$-estimate algorithms for robust adaptive filtering in impulsive noise. IEEE Trans Circuits Syst II 47(12):1564–1569
Article Google Scholar

Download references

Acknowledgments

The authors thank CNPq (Grant 309841/2012-7) for the financial support and NUTEC (Fundação Núcleo de Tecnologia Industrial do Ceará) for providing the laboratory infrastructure for the execution of the research activities reported in this paper. We also thank Mr. César Lincoln Mattos and José Daniel Santos for the kind help in generating the results for the KLMS classifier.

Author information

Authors and Affiliations

Department of Teleinformatics Engineering, Federal University of Ceará (UFC) Center of Technology, Campus of Pici, Fortaleza, Ceará, Brazil
Guilherme A. Barreto
Department of Computer Science, State University of Ceará (UECE), Campus of Itaperi, Fortaleza, Ceará, Brazil
Ana Luiza B. P. Barros

Authors

Guilherme A. Barreto
View author publications
You can also search for this author in PubMed Google Scholar
Ana Luiza B. P. Barros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guilherme A. Barreto.

Appendix

By applying a nonlinear transformation in the input data, it is possible to obtain a nonlinear classifier from the same error function in Eq. (9). In a kernel context, the KLMS algorithm [30] operates on the feature space obtained by applying a mapping $\Phi (\cdot )$ to the inputs, generating a new sequence of input-output pairs $\{(\Phi (\varvec{x}_n), \mathbf {d}_n)\}_{n=1}^N$ [30]. Weight updating is similar to the LMS rule shown in Eq. (10):

$$\begin{aligned} \hat{\varvec{\beta }}_{i,n+1} = \hat{\varvec{\beta }}_{i,n} + \eta e_{in}\Phi (\varvec{x}_n). \end{aligned}$$

(28)

Considering $\hat{{\varvec{\beta }}}_{i,0} = \varvec{0}$, where $\varvec{0}$ is the null-vector, after $N$ iterations we get

$$\begin{aligned} \hat{{\varvec{\beta }}}_{i,N}&= \mu \sum _{n=1}^{N-1} e_{in} \Phi (\varvec{x}_n), \end{aligned}$$

(29)

$$\begin{aligned} \hat{y}_{i,N}&= \hat{{\varvec{\beta }}}_{i,N}^T\Phi (\varvec{x}_N) = \mu \sum _{n=1}^{N-1} e_{in} \kappa (\varvec{x}_n, \varvec{x}_N), \end{aligned}$$

(30)

where $\kappa (\varvec{x}_n, \varvec{x}_N) = \Phi (\varvec{x}_n)^T\Phi (\varvec{x}_N)$ is a positive-definite kernel function. It should be noted that only Eq. (30) is needed both for training and testing. Although the values of the weight vector do not need to be computed, the a priori errors $e_{in}, n \in \{1, \cdots N\}$, and the training inputs $\varvec{x}_n, n \in \{1, \cdots N\}$, must be maintained for prediction purposes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barreto, G.A., Barros, A.L.B.P. On the Design of Robust Linear Pattern Classifiers Based on $M$-Estimators. Neural Process Lett 42, 119–137 (2015). https://doi.org/10.1007/s11063-014-9393-2

Download citation

Published: 25 October 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s11063-014-9393-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Design of Robust Linear Pattern Classifiers Based on \(M\)-Estimators

Abstract

Access this article

Similar content being viewed by others

Improving the Classification Performance of Optimal Linear Associative Memory in the Presence of Outliers

Improved Adaline Networks for Robust Pattern Classification

A new intelligent pattern classifier based on deep-thinking

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the Design of Robust Linear Pattern Classifiers Based on \(M\)-Estimators

Abstract

Access this article

Similar content being viewed by others

Improving the Classification Performance of Optimal Linear Associative Memory in the Presence of Outliers

Improved Adaline Networks for Robust Pattern Classification

A new intelligent pattern classifier based on deep-thinking

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation