Elsevier

Neurocomputing

Volume 70, Issues 1–3, December 2006, Pages 475-488
Neurocomputing

The regularized LVQ1 algorithm

https://doi.org/10.1016/j.neucom.2005.12.123Get rights and content

Abstract

This paper introduces a straightforward generalization of the well-known LVQ1 algorithm for nearest neighbour classifiers that includes the standard LVQ1 and the k-means algorithms as special cases. It is based on a regularizing parameter that monotonically decreases the upper bound of the training classification error towards a minimum. Experiments using 10 real data sets show the utility of this simple extension of LVQ1.

Introduction

Nearest neighbour (NN) methods [16] are still among the simplest and most successful ways of solving pattern recognition problems. As several comparative studies on real-world problems [39], [29] suggest, NN methods are often very competitive in comparison with more sophisticated and modern algorithms. Their success can be explained from a theoretical point of view, since they converge to Bayes classifier as the number of neighbours K and the number of prototypes M tend to infinite at an appropriate rate for all distributions. Also, with NN methods, there is a slightly higher probability of misclassifications occurring than with Bayes error PerrB for K finite and M→∞ (i.e.: PerrK-NN⩽(1+√(2/K))PerrB and Perr1-NN⩽2PerrB [17]). These facts, combined with recent advances on memory-based systems [46], lazy methods [1] and local regression [20], have revived the interest for these techniques in the last decade. A plethora of new learning algorithms has been recently studied (e.g. [18], [19], [36] and other commented in Section 2.2).

NN classifiers are local learning systems [15]: they fit the training data only in a region around the location of an input pattern. Given a pattern x to classify, the K-nearest-neighbour classification rule is based on the following algorithm:

  • (i)

    Find the K nearest patterns to x in the set of prototypes P={(mj, cl(mj), j=0,…,M−1} where mj is a prototype that belongs to one of the classes and cl(mj) is the class indicator variable.

  • (ii)

    Establish the classification by a majority vote amongst these K patterns.

NN classifiers allow for a variety of design choices that can be adjusted automatically from data such as the metric d to measure closeness between patterns, the number of neighbours K, the set of prototypes P and the size M. The most common similarity measurement d(x,y) is the Euclidean distance d(x,y)=||x−y||2, while K can be automatically selected using a validation set, although K=1 is a common choice due to the following:

  • (1)

    Euclidean 1-NN classifiers form class boundaries with piecewise linear hyperplanes. They can therefore be used to solve a large class of classifiers since any border can be approximated by a series of locally defined hyperplanes.

  • (2)

    Most of the learning algorithms that compute P from training data work with 1-NN classifiers.

Finally, the design of the set of prototypes is the most difficult and challenging task. The simplest method would be to select the whole training set DN={(xi, cl(xi)), i=0,…,N−1} (where xi is a random sample of X and cl(xi)) is the class label associated to xi) as P. Nevertheless, this option would result in large memory and execution requirements in large databases. Therefore, in practice, a small set of prototypes of size M (with MN) is mandatory. There are three main classes of learning algorithms that are designed to reduce the number of prototypes stored:

  • (1)

    Condensing algorithms: Since only near class border training data are useful in the classification process, condensing procedures aim to keep those points from training data which form class boundaries [17, Section 19].

  • (2)

    Editing algorithms: These retain any training patterns that fall inside class borders estimated to be within the same training set. Such patterns tend to form homogeneous clusters because only points at the centre of the natural groups in the data are retained [17, Section 26].

  • (3)

    Clustering algorithms: It is also feasible to use any NN vector quantization algorithm [23] (e.g. K-means [38]) to form a set of labelled prototypes). Firstly, we obtain a set of unlabelled prototypes from training data using the clustering algorithm. These prototypes can then be used to divide the input space in K-nearest-neighbour cells. Finally, we can assign labels to prototypes according to a majority vote by the training data in each cell [17, Section 21.5]. However, it is also possible to compute labelled centroids using a one-step learning strategy such as learning vector quantization (LVQ) algorithms [30].

As pointed out in [17, Section 19.3], clustering algorithms seem preferable to condensing and editing algorithms for the following reason: if the values of P are allowed to be arbitrary, prototypes are not constrained to training points. A more flexible class of classifiers can therefore be designed. However, the most preferable strategy for designing prototypes is to minimize the empirical classification error produced in the training set DN [17, p. 311], since generalization error bounds for the 1-NN classifier based on the VC theory [17, Section 19] can be applied. Our work here shows that LVQ1 does not minimize the classification error, but a simple and straightforward generalization that introduces a regularizing parameter monotonically decreases the upper bound of the misclassification rate of the 1-NN classifier and thus improves the classification results obtained by LVQ1.

The paper is organized as follows. In Section 2, a review of LVQ1 and its basic limitations is presented. Section 3 introduces and analyses RegLVQ1, a regularized form of the LVQ1 algorithm, which controls the upper bound of the training error. Section 4 includes a comparative empirical study of RegLVQ1 and other learning algorithms for 1-NN classifiers in ten real-world problems. Finally, some conclusions are given in Section 5.

Section snippets

Limitations of LVQ1

Suppose we have N observation pairs DN={(xi,cl(xi)),i=0,,N-1}, where xip is a random pattern that belongs to one of the c classes and cl(xi) is the class label associated to xi. The aim of a learning algorithm for a Euclidean nearest neighbour (NN) classifier is to design a set of labelled prototypes P=[m0Tm1TmM1T]T using DN. There are M Voronoi regions Rj={x|x-mj=mini=1,,Kx-mi}, j=0,…,M−1, where the classifier maps any input pattern that falls within it to the class to which its

The RegLVQ1 cost function

A simple solution to improve the classification rate of LVQ1 leads to the so-called regularized LVQ1 (RegLVQ1) algorithm, which is based on minimizing the following cost function:ERegLVQ1(P,λ)=12Ni=0N-1j=0M-1ERegLVQ1(mj,λ)withERegLVQ1(mj,λ)=1(xiRj)(1(cl(xi)=cl(mj))-λ1(cl(xi)cl(mj)))xi-mj2,where a regularizing parameter λ is introduced in the cost function. Note that (16) gives the quantification error performed separately for each class when λ=0 and it is equivalent than ELVQ1 when λ=1

Data sets

Our experiments used three small data sets (Glass, Sonar and Soybean), four medium-sized data sets (DNA, Satimage, Segment and Speech) and three large sets (Lower, Upper and Shuttle). All but the Speech, Lower and Upper data sets are in the UCI repository of machine learning databases [12]. The Speech data set was taken from the LVQ_PAK [31] and the Upper and Lower data sets were generated from the handwritten NIST database [22]. All these databases belong to real-world problems. Their main

Conclusions

A straightforward generalization of LVQ1 (RegLVQ1) has been presented which includes K-means and LVQ1 as special cases. The minimization of the RegLVQ1 cost function ensures a training classification error errT(λ)⩽1/(1+λ) where λ is a regularizing parameter determined by the user. This would also reduce the number of feasible solutions and can be considered a simple mechanism for reducing undesired local minimum points. However, in practice, a finite value of λ is employed and the amount of

Acknowledgements

The author wishes to thank the anonymous reviewers for their comments, which have helped to improve the final version of this paper. This work was supported in part by the Ministerio de Educación y Ciencia and by the EU's European Regional Development Fund through Grant TEC2004-05127-C02-01.

Sergio Bermejo received his M.Sc. and Ph.D. degrees in Telecommunication Engineering, in 1996 and 2000, respectively, from the Universitat Politècnica de Catalunya (UPC). In 1996, he joined UPC's Department of Electronics Engineering (DEE) as a researcher. Currently, he holds a position as associate professor in the DEE and teaches at the School of Telecommunications Engineering of Barcelona (ETSETB). His research interests are statistical learning, with a special focus on large margin

References (47)

  • S. Bermejo et al.

    Learning with 1-nearest-neighbour classifiers

    Neural Process. Lett.

    (2001)
  • S. Bermejo et al.

    Local averaging of ensembles of LVQ-based nearest neighbour classifiers

    Appl. Intell.

    (2004)
  • J.C. Bezdek et al.

    Multiple-prototype classifier design

    IEEE Trans. Systems, Man, Cybern-Part C: Appl. Rev.

    (1998)
  • C. Bishop

    Neural Networks and Pattern Recognition

    (1995)
  • C.L. Blake, C.J. Merz, UCI Repository of machine learning databases, University of California, Department of...
  • T. Bojer et al.

    Relevance determination in learning vector quantization

  • L. Bottou

    Online learning and stochastic approximation

  • L. Bottou et al.

    Local learning algorithms

    Neural Comput.

    (1992)
  • L. Devroye et al.

    A Probabilistic Theory of Pattern Recognition

    (1996)
  • A. Djouadi

    On the reduction of the nearest-neighbor variation for more accurate classification and error estimates

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1998)
  • A. Djouadi et al.

    A fast algorithm for the nearest-neighbor classifier

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • J. Fan et al.

    Local Polynomial Modelling and its Applications

    (1996)
  • Cited by (2)

    Sergio Bermejo received his M.Sc. and Ph.D. degrees in Telecommunication Engineering, in 1996 and 2000, respectively, from the Universitat Politècnica de Catalunya (UPC). In 1996, he joined UPC's Department of Electronics Engineering (DEE) as a researcher. Currently, he holds a position as associate professor in the DEE and teaches at the School of Telecommunications Engineering of Barcelona (ETSETB). His research interests are statistical learning, with a special focus on large margin classification, unsupervised learning and their application to smart sensors, signal processing, software agents and autonomous robotics

    View full text