The regularized LVQ1 algorithm

doi:10.1016/j.neucom.2005.12.123

Neurocomputing

Volume 70, Issues 1–3, December 2006, Pages 475-488

https://doi.org/10.1016/j.neucom.2005.12.123 Get rights and content

Abstract

This paper introduces a straightforward generalization of the well-known LVQ1 algorithm for nearest neighbour classifiers that includes the standard LVQ1 and the k-means algorithms as special cases. It is based on a regularizing parameter that monotonically decreases the upper bound of the training classification error towards a minimum. Experiments using 10 real data sets show the utility of this simple extension of LVQ1.

Introduction

Nearest neighbour (NN) methods [16] are still among the simplest and most successful ways of solving pattern recognition problems. As several comparative studies on real-world problems [39], [29] suggest, NN methods are often very competitive in comparison with more sophisticated and modern algorithms. Their success can be explained from a theoretical point of view, since they converge to Bayes classifier as the number of neighbours K and the number of prototypes M tend to infinite at an appropriate rate for all distributions. Also, with NN methods, there is a slightly higher probability of misclassifications occurring than with Bayes error Perr_B for K finite and M→∞ (i.e.: Perr_K_-NN⩽(1+√(2/K))Perr_B and Perr_1-NN⩽2Perr_B [17]). These facts, combined with recent advances on memory-based systems [46], lazy methods [1] and local regression [20], have revived the interest for these techniques in the last decade. A plethora of new learning algorithms has been recently studied (e.g. [18], [19], [36] and other commented in Section 2.2).

NN classifiers are local learning systems [15]: they fit the training data only in a region around the location of an input pattern. Given a pattern x to classify, the K-nearest-neighbour classification rule is based on the following algorithm:

(i)
Find the K nearest patterns to x in the set of prototypes P={(m_j, cl(m_j), j=0,…,M−1} where m_j is a prototype that belongs to one of the classes and cl(m_j) is the class indicator variable.
(ii)
Establish the classification by a majority vote amongst these K patterns.

NN classifiers allow for a variety of design choices that can be adjusted automatically from data such as the metric d to measure closeness between patterns, the number of neighbours K, the set of prototypes P and the size M. The most common similarity measurement d(x,y) is the Euclidean distance d(x,y)=||x−y||², while K can be automatically selected using a validation set, although K=1 is a common choice due to the following:

(1)
Euclidean 1-NN classifiers form class boundaries with piecewise linear hyperplanes. They can therefore be used to solve a large class of classifiers since any border can be approximated by a series of locally defined hyperplanes.
(2)
Most of the learning algorithms that compute P from training data work with 1-NN classifiers.

Finally, the design of the set of prototypes is the most difficult and challenging task. The simplest method would be to select the whole training set D_N={(x_i, cl(x_i)), i=0,…,N−1} (where x_i is a random sample of X and cl(x_i)) is the class label associated to x_i) as P. Nevertheless, this option would result in large memory and execution requirements in large databases. Therefore, in practice, a small set of prototypes of size M (with M⪡N) is mandatory. There are three main classes of learning algorithms that are designed to reduce the number of prototypes stored:

(1)
Condensing algorithms: Since only near class border training data are useful in the classification process, condensing procedures aim to keep those points from training data which form class boundaries [17, Section 19].
(2)
Editing algorithms: These retain any training patterns that fall inside class borders estimated to be within the same training set. Such patterns tend to form homogeneous clusters because only points at the centre of the natural groups in the data are retained [17, Section 26].
(3)
Clustering algorithms: It is also feasible to use any NN vector quantization algorithm [23] (e.g. K-means [38]) to form a set of labelled prototypes). Firstly, we obtain a set of unlabelled prototypes from training data using the clustering algorithm. These prototypes can then be used to divide the input space in K-nearest-neighbour cells. Finally, we can assign labels to prototypes according to a majority vote by the training data in each cell [17, Section 21.5]. However, it is also possible to compute labelled centroids using a one-step learning strategy such as learning vector quantization (LVQ) algorithms [30].

As pointed out in [17, Section 19.3], clustering algorithms seem preferable to condensing and editing algorithms for the following reason: if the values of P are allowed to be arbitrary, prototypes are not constrained to training points. A more flexible class of classifiers can therefore be designed. However, the most preferable strategy for designing prototypes is to minimize the empirical classification error produced in the training set D_N [17, p. 311], since generalization error bounds for the 1-NN classifier based on the VC theory [17, Section 19] can be applied. Our work here shows that LVQ1 does not minimize the classification error, but a simple and straightforward generalization that introduces a regularizing parameter monotonically decreases the upper bound of the misclassification rate of the 1-NN classifier and thus improves the classification results obtained by LVQ1.

The paper is organized as follows. In Section 2, a review of LVQ1 and its basic limitations is presented. Section 3 introduces and analyses RegLVQ1, a regularized form of the LVQ1 algorithm, which controls the upper bound of the training error. Section 4 includes a comparative empirical study of RegLVQ1 and other learning algorithms for 1-NN classifiers in ten real-world problems. Finally, some conclusions are given in Section 5.

Section snippets

Limitations of LVQ1

Suppose we have N observation pairs $D_{N} = {(x_{i}, cl (x_{i})), i = 0, \dots, N - 1}$ , where $x_{i} \in ℜ^{p}$ is a random pattern that belongs to one of the c classes and cl(x_i) is the class label associated to x_i. The aim of a learning algorithm for a Euclidean nearest neighbour (NN) classifier is to design a set of labelled prototypes $P = {[m_{0}^{T} m_{1}^{T} \dots m_{M ‐ 1}^{T}]}^{T}$ using D_N. There are M Voronoi regions $R_{j} = {x | ‖ x - m_{j} ‖ = \min_{i = 1, \dots, K} ‖ x - m_{i} ‖}$ , j=0,…,M−1, where the classifier maps any input pattern that falls within it to the class to which its

The RegLVQ1 cost function

A simple solution to improve the classification rate of LVQ1 leads to the so-called regularized LVQ1 (RegLVQ1) algorithm, which is based on minimizing the following cost function: $\begin{matrix} E_{Re gLVQ 1} (P, λ) = \frac{1}{2 N} \sum_{i = 0}^{N - 1} \sum_{j = 0}^{M - 1} E_{Re gLVQ 1} (m_{j}, λ) with \\ E_{Re gLVQ 1} (m_{j}, λ) = 1 (x_{i} \in R_{j}) (1 (cl (x_{i}) = cl (m_{j})) - λ 1 (cl (x_{i}) \neq cl (m_{j}))) {‖ x_{i} - m_{j} ‖}^{2}, \end{matrix}$ where a regularizing parameter λ is introduced in the cost function. Note that (16) gives the quantification error performed separately for each class when λ=0 and it is equivalent than E_LVQ1 when λ=1

Data sets

Our experiments used three small data sets (Glass, Sonar and Soybean), four medium-sized data sets (DNA, Satimage, Segment and Speech) and three large sets (Lower, Upper and Shuttle). All but the Speech, Lower and Upper data sets are in the UCI repository of machine learning databases [12]. The Speech data set was taken from the LVQ_PAK [31] and the Upper and Lower data sets were generated from the handwritten NIST database [22]. All these databases belong to real-world problems. Their main

Conclusions

A straightforward generalization of LVQ1 (RegLVQ1) has been presented which includes K-means and LVQ1 as special cases. The minimization of the RegLVQ1 cost function ensures a training classification error err_T(λ)⩽1/(1+λ) where λ is a regularizing parameter determined by the user. This would also reduce the number of feasible solutions and can be considered a simple mechanism for reducing undesired local minimum points. However, in practice, a finite value of λ is employed and the amount of

Acknowledgements

The author wishes to thank the anonymous reviewers for their comments, which have helped to improve the final version of this paper. This work was supported in part by the Ministerio de Educación y Ciencia and by the EU's European Regional Development Fund through Grant TEC2004-05127-C02-01.

Sergio Bermejo received his M.Sc. and Ph.D. degrees in Telecommunication Engineering, in 1996 and 2000, respectively, from the Universitat Politècnica de Catalunya (UPC). In 1996, he joined UPC's Department of Electronics Engineering (DEE) as a researcher. Currently, he holds a position as associate professor in the DEE and teaches at the School of Telecommunications Engineering of Barcelona (ETSETB). His research interests are statistical learning, with a special focus on large margin

References (47)

S. Bermejo et al.
Adaptive soft K-nearest neighbor classifiers
Pattern Recognition
(2000)
S. Bermejo et al.
Oriented principal component analysis for large margin classifiers
Neural Networks
(2001)
B. Hammer et al.
Generalized relevance learning vector quantization
Neural Networks
(2002)
M. Pregenzer et al.
Automated feature selection with a distinction sensitive learning vector quantizer
Neurocomputing
(1996)
K. Urahama et al.
Gradient descent learning of nearest neighbor classifiers with outlier rejection
Pattern Recognition
(1995)
D.W. Aha
Special issue on lazy learning
Artif. Intell. Rev.
(1997)
E. Alpaydin
Voting over multiple condensed nearest neighbor classifiers
Artif. Intell. Rev.
(1997)
A. Benveniste et al.
Adaptive Algorithms and Stochastic Approximations
(1990)
S. Bermejo et al.
A batch learning vector quantization for nearest neighbour classifiers
Neural Process. Lett.
(2000)
S. Bermejo et al.
Finite-sample convergence of the LVQ1 algorithm and the BLVQ1 algorithm
Neural Process. Lett.
(2001)

S. Bermejo et al.

Learning with 1-nearest-neighbour classifiers

Neural Process. Lett.

(2001)

S. Bermejo et al.

Local averaging of ensembles of LVQ-based nearest neighbour classifiers

Appl. Intell.

(2004)

J.C. Bezdek et al.

Multiple-prototype classifier design

IEEE Trans. Systems, Man, Cybern-Part C: Appl. Rev.

(1998)

C. Bishop

Neural Networks and Pattern Recognition

(1995)

C.L. Blake, C.J. Merz, UCI Repository of machine learning databases, University of California, Department of...

T. Bojer et al.

Relevance determination in learning vector quantization

L. Bottou

Online learning and stochastic approximation

L. Bottou et al.

Local learning algorithms

Neural Comput.

(1992)

L. Devroye et al.

A Probabilistic Theory of Pattern Recognition

(1996)

A. Djouadi

On the reduction of the nearest-neighbor variation for more accurate classification and error estimates

IEEE Trans. Pattern Anal. Mach. Intell.

(1998)

A. Djouadi et al.

A fast algorithm for the nearest-neighbor classifier

IEEE Trans. Pattern Anal. Mach. Intell.

(1997)

J. Fan et al.

Local Polynomial Modelling and its Applications

(1996)

Cited by (2)

Neural networks and statistical learning, second edition
2019, Neural Networks and Statistical Learning, Second Edition
Neural networks and statistical learning
2014, Neural Networks and Statistical Learning

View full text

The regularized LVQ1 algorithm

Abstract

Introduction

Section snippets

Limitations of LVQ1

The RegLVQ1 cost function

Data sets

Conclusions

Acknowledgements

Pattern Recognition

Neural Networks

Neural Networks

Neurocomputing

Pattern Recognition

Special issue on lazy learning

Artif. Intell. Rev.

Voting over multiple condensed nearest neighbor classifiers

Artif. Intell. Rev.

Adaptive Algorithms and Stochastic Approximations

A batch learning vector quantization for nearest neighbour classifiers

Neural Process. Lett.

Finite-sample convergence of the LVQ1 algorithm and the BLVQ1 algorithm

Neural Process. Lett.

Learning with 1-nearest-neighbour classifiers

Neural Process. Lett.

Local averaging of ensembles of LVQ-based nearest neighbour classifiers

Appl. Intell.

Multiple-prototype classifier design

IEEE Trans. Systems, Man, Cybern-Part C: Appl. Rev.

Neural Networks and Pattern Recognition

Relevance determination in learning vector quantization

Online learning and stochastic approximation

Local learning algorithms

Neural Comput.

A Probabilistic Theory of Pattern Recognition

On the reduction of the nearest-neighbor variation for more accurate classification and error estimates

IEEE Trans. Pattern Anal. Mach. Intell.

A fast algorithm for the nearest-neighbor classifier

IEEE Trans. Pattern Anal. Mach. Intell.

Local Polynomial Modelling and its Applications