Abstract
This paper introduces a learning strategy for designing a set of prototypes for a 1-nearest-neighbour (NN) classifier. In learning phase, we transform the 1-NN classifier into a maximum classifier whose discriminant functions use the nearest models of a mixture. Then the computation of the set of prototypes is viewed as a problem of estimating the centres of a mixture model. However, instead of computing these centres using standard procedures like the EM algorithm, we derive to compute a learning algorithm based on minimising the misclassification accuracy of the 1-NN classifier on the training set. One possible implementation of the learning algorithm is presented. It is based on the online gradient descent method and the use of radial gaussian kernels for the models of the mixture. Experimental results using hand-written NIST databases show the superiority of the proposed method over Kohonen's LVQ algorithms.
Similar content being viewed by others
References
Bengio, Y.: Artificial Neural Networks and Their Application to Sequence Recognition, Ph.D. thesis, Department of Computer Science, McGill University (1991).
Benveniste, A., Métivier, M. and Priouret, P.: Adaptive Algorithms and Stochastic Approximations, Springer-Verlag: Berlin (1990).
Bermejo, S. and Cabestany, J.: Finite-Sample Convergence Properties of the LVQ1 algorithm and the BLVQ1 algorithm, Neural Processing Letters 13, 135–157 (this issue).
Bishop, C. M.: Neural Networks and Pattern Recognition, Oxford University Press, Oxford (1995).
Bottou, L.: Online Learning and Stochastic Approximations, In David Saal (ed.), Online Learning and Neural Networks, Cambridge University Press, Cambridge (1998).
Breiman, L.: Half-&-Half Bagging and Hard Boundary Points, Department of Statistics, University of California, Technical Report No. 534 (1998).
Darasay, B. V. (ed.): Nearest Neighbor Pattern Classification Techniques, IEEE Computer Society Press, Los Alamitos, LA (1991).
Dempster, A. P., Laird, N. M., and Rubin, D. B.: Maximum Likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39, (1977), 1–38.
Devroye, L., Györfi, L. and Lugosi, G.: A Probabilistic Theory of Pattern Recognition, Springer-Verlag, Berlin (1996).
Duda, R. O., and Hart, P. E.: Pattern Classification and Scene Analysis, John Wiley Interscience, New York (1973).
Friedman, J.: FlexibleMetric Nearest Neighbor Classification, Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford CA, Technical Report (1994).
Garris, M. et al.: NIST Form-Based Handprint Recognition System (Release 2.0), National Institute of Standards and Technology (1997).
Hart, P. E.: The Condensed Nearest Neighbor Rule, IEEE Trans. Inf. Th. (Corresp.), IT-14 (1968), 515–516.
Haykin, S.: Neural Networks: A Comprehensive Foundation, Macmillan College Publishing Company, Englewood Cliffs, NJ (1994).
Kohonen, T., Hynninen, J., Kangas, J. Laaksonen, J. and Torkkola, K. Kari: LVQ_PAK. The Learning Vector Quantization Program Package. Version 3.1, Laboratory of Computer and Information Science, Helsinki University of Technology (1995).
Kohonen, T.: Self-organizing Maps, 2nd Edn, Springer-Verlag, Berlin (1996).
Mel, B. W. and Omohundro, S. M.: How Receptive Field Parameters Affect Neural Learning, R. P. Lippmann, J. E. Moody, and D. S. Touretzky (eds.), Advances in Neural Information Processing Systems 3, Morgan Kaufmann Publishers, Boston, MA (1991).
Michie, D. J., Spiegelhalter, and Taylor, C. C. (eds.): Machine Learning, Neural and Statistical Classification, Ellis Horwood: London (1994).
McLachlan, G. J. and Basford, K. E.: Mixture Models. Inference and Applications to Clustering, Marcel Dekker, New York (1988).
Ripley, D.: Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge (1996).
Schölkopf, B.: Support Vector Learning. Ph.D. Thesis, Informatik der Technischen Universität Berlin (1997).
Scott, W.: Multivariate Density Estimation: Theory, Practice and Visualization, John Wiley and Sons, New York (1992).
Tarter, M. E. and Lock, M. D.: Model-Free Curve Estimation, Chapman and Hall, New York (1993).
Vapnik, V.: Estimation of Dependencies based on Empirical Data, Springer-Verlag, New York (1982).
Wilson, D.: Asymptotic properties of nearest neighbor rules using edited data, IEEE Trans. on Systems, Man and Cybernetics 2, (1972), 408–421.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Bermejo, S., Cabestany, J. Learning with Nearest Neighbour Classifiers. Neural Processing Letters 13, 159–181 (2001). https://doi.org/10.1023/A:1011332406386
Issue Date:
DOI: https://doi.org/10.1023/A:1011332406386