Elsevier

Pattern Recognition Letters

Volume 19, Issue 13, November 1998, Pages 1165-1170
Pattern Recognition Letters

Improving the k-NCN classification rule through heuristic modifications1

https://doi.org/10.1016/S0167-8655(98)00108-1Get rights and content

Abstract

This paper presents an empirical investigation of the recently proposed k-Nearest Centroid Neighbours (k-NCN) classification rule along with two heuristic modifications of it. These alternatives make use of both proximity and geometrical distribution of the prototypes in the training set in order to estimate the class label of a given sample. The experimental results show that both alternatives give significantly better classification rates than the k-Nearest Neighbours rule, basically due to the properties of the plain k-NCN technique.

Introduction

The k-Nearest Neighbours (k-NN) rule (Duda and Hart, 1973) is one of the most remarkable choices among non-parametric classification rules. This is a distance-based technique which classifies a test sample according to the classes of its k closest cases in a set of n previously labelled prototypes, X={x1,…,xn}. It is well known that the error for the k-NN rule tends towards the Bayes error in the asymptotic case (n→∞). However, in practice, due to the finite sample size, the k-NN estimates are no longer optimal. This problem becomes more relevant when the number of prototypes is not large enough compared to the dimensionality of the feature space (Fukunaga, 1990), which constitutes a very usual practical situation.

A number of alternative neighbourhood definitions have been applied to classification problems, trying to partially overcome the practical drawbacks pointed out for the k-NN rule. In particular, the concept of Nearest Centroid Neighbourhood (NCN) (Chaudhuri, 1996) along with the neighbourhood relation derived from the Gabriel and the Relative Neighbourhood graphs (Jaromczyk and Toussaint, 1992) have successfully been used in finite sample size situations (Sánchez et al., 1997). The resulting classification approaches have been generically referred to as surrounding rules because they try to look for prototypes not only close enough (in the basic distance sense) but also homogeneously or symmetrically distributed around a sample.

Although the surrounding classification schemes have been proven to outperform the k-NN rule in most cases, this kind of neighbourhood also suffers from some drawbacks due to the fact that it may contain some prototypes which are not sufficiently close to the test sample. Thus, this paper proposes some modifications of the k-NCN rule which try to solve this problem and then to achieve better results.

The organization of the rest of this paper is as follows. Section 2describes the NCN concept and the derived k-NCN classifier, as well as the conceptual differences with respect to the k-NN rule. In Section 3, two modifications of the k-NCN rule are introduced. Section 4provides an experimental study for both synthetic and real data sets. Finally, some concluding remarks are given in Section 5.

Section snippets

Surrounding neighbourhood

The k-NN rule consists of estimating the class of a given sample through its k closest prototypes in the training set. This technique considers that all the information required to classify a new sample can be obtained from a small subset of prototypes close to it. However, it does not take into account the geometrical distribution of those k prototypes with respect to the given sample, that is, in general the nearest prototypes do not completely surround the sample since the k-NN rule defines

Using proximity and spatial homogeneity for classification

We here propose two heuristic modifications of the k-NCN decision rule in order to improve its correct classification rate. These alternative schemes try to jointly use information about proximity as well as about the spatial distribution of prototypes around a given sample. In fact, although it has been empirically proven that the k-NCN rule may outperform the k-NN classifier (Sánchez et al., 1997), some nearest centroid neighbours may be too far from the sample to classify, which can

Empirical comparison

Several experiments using both synthetic and real databases (Murphy and Aha, 1991) have been carried out in order to compare the efficiency of the classification schemes considered in this work. Five different random partitions (half of prototypes for training and half for testing purposes) of each original data set, have been used to obtain averaged measures about the performance of each classification rule. In particular, the focus of the present experimental study is on a comparison of the

Conclusions

Alternative approaches to neighbourhood-based classification have been considered in this work. In particular, the recently introduced k-NCN decision rule along with two heuristic modifications have been used. These extensions to the k-NCN technique try to take into account both proximity and geometrical distribution of the prototypes.

From the experiments carried out, it can be concluded that the modifications proposed here achieve even higher classification rates than the plain k-NCN rule,

References (10)

There are more references available in the full text version of this article.

Cited by (25)

View all citing articles on Scopus
1

This work has partially been supported by projects P1B96-13 (Fundació Caixa-Castelló), and AGF95-0712-C03-01 and TIC95-676-C02-01 (Spanish CICYT).

View full text