Elsevier

Pattern Recognition

Volume 35, Issue 10, October 2002, Pages 2311-2318
Pattern Recognition

Improved k-nearest neighbor classification

https://doi.org/10.1016/S0031-3203(01)00132-7Get rights and content

Abstract

k-nearest neighbor (k-NN) classification is a well-known decision rule that is widely used in pattern classification. However, the traditional implementation of this method is computationally expensive. In this paper we develop two effective techniques, namely, template condensing and preprocessing, to significantly speed up k-NN classification while maintaining the level of accuracy. Our template condensing technique aims at “sparsifying” dense homogeneous clusters of prototypes of any single class. This is implemented by iteratively eliminating patterns which exhibit high attractive capacities. Our preprocessing technique filters a large portion of prototypes which are unlikely to match against the unknown pattern. This again accelerates the classification procedure considerably, especially in cases where the dimensionality of the feature space is high. One of our case studies shows that the incorporation of these two techniques to k-NN rule achieves a seven-fold speed-up without sacrificing accuracy.

Introduction

The k-nearest neighbor (k-NN) rule [1], [2], [3], [4], [5], [6], [7], [8] is a well-known decision rule widely used in pattern classification applications. The misclassification rate of the k-NN rule approaches the optimal Bayes error rate asymptotically as k increases [3] and is particularly effective when the probability distributions of the feature variables are not known, thereby rendering the Bayes decision rule [3] ineffective. The computational inefficiency of the k-NN rule stems from the following observation. To perform template1 matching, the complexity of each matching is O(n), where n is the dimension of the feature space. In order to achieve a high recognition rate, the feature dimension n and the template size M are chosen to be large. For example, consider the GSC recognizer which uses features based on gradient, structural, and concavity aspects of a character image [8] and uses the k-NN rule to achieve high classification accuracy. It has a feature dimension of 512 and template size of 32,000 [8] making it quite inefficient to match a test pattern against the entire set of prototypes. In this paper we propose two effective techniques to improve the efficiency: template condensing and preprocessing.

Template condensing is an important part of the nearest neighbor (1-NN) rule. The set of prototypes in the template are chosen so that classification obtained using any proper subset of the initial template leads to a gradual degradation in recognition accuracy. This greatly decreases the number of prototypes that an unknown pattern must be compared to with sacrifice of accuracy [9], [10], [11], [12], [13]. In this paper, we develop a novel method of selecting the subset of prototypes for general k-NN classification. The idea is motivated by the observation that, if a large number of prototypes form a homogeneous cluster in feature space, then the number of prototypes in the neighborhood of the test pattern is usually larger than k (sufficient number according to the k-NN rule) when the test pattern is located in this area. This observation is further strengthened by the fact that k is usually quite small in real applications in order to keep the process of searching the nearest k prototypes efficient. Our idea is to “sparsify” dense homogeneous clusters by iteratively eliminating patterns which exhibit high “attractive capacities” (defined in Section 3). This not only reduces the template size significantly but also maintains the level of classification accuracy. In this sense the method presented in this paper differs from those described in [9], [10], [11], [12], [13].

We also describe a preprocessing operation wherein an unknown pattern is matched against a prototype in two sequential stages. In the first stage a quick assessment of the potential of match is made. The approach is motivated by an insightful observation that the norm of a pattern vector represents a characteristic of the pattern. In order for a full match to occur in the second stage, the difference of the norms of the prototype and the test pattern must be less than a predetermined threshold. The threshold is designed for each prototype individually. Prototypes that fail in the first stage of matching are not considered any further. A large portion of the prototypes are thus dynamically precluded. This preprocessing just takes one step, i.e., the complexity is O(1) and is independent of the dimensionality of the feature space. Furthermore, such preprocessing does not sacrifice the accuracy for it only rejects prototypes which are not “close” to the test pattern in feature space, if properly applied.

The rest of the paper is organized as follows. In Section 2, we introduce the general k-NN classification. In 3 Template condensing, 4 Preprocessing, we present template condensing and preprocessing respectively. We present experimental results in Section 5, and draw conclusions in Section 6.

Section snippets

Preliminary: k-NN classification

Let p be the number of classes, and C≜{c(i),i=1,2,…,p} be the set of class labels. Let Φ be a set of labeled patterns referred to as a template. A labeled pattern yRn in the template is referred to as a prototype, where n denotes the pattern dimension. w(y) denotes the weight of a prototype y, i.e., the number of prototypes y in the template. The class label of a prototype y is denoted by c(y).

Let H(x,y) be the matching measure between pattern x and y, where H is supposed to be a non-negative

Template condensing

In the k-NN classification the (k+1),(k+2),…, prototypes in the template nearest to an unknown x do not affect the classification of x. In fact, k is usually chosen to be a small number, otherwise sorting k nearest patterns over a template of size M, after all matching measures H(x,·) are calculated, will need computational complexity O(kM/p) [14]. Often the number of prototypes (all of a single class) which are nearer to x than prototypes of other classes is much larger than k (which gives the

Preprocessing

In the previous section we have introduced a method to reduce the template size while maintaining nearly the original accuracy. In this section, we further enhance the efficiency of the k-NN algorithm. Our idea is to reject a large part of the template prototypes dynamically by carrying out computationally efficient preprocessing.

We observe that the norm of a prototype, ||·||, is a special characteristic of that prototype when appropriately defined (usually l1 or l2 norm). An unknown pattern x

Experimental results

In this section we describe the application of the two techniques described in 3 Template condensing, 4 Preprocessing to handwritten numeral recognition where the number of classes is 10 (p=10). The training set Ω of 126,000 patterns has an equal number of patterns in each class. The testing set has 25,300 patterns and again equal number in each class. The experimental platform is the SPARC 400MHz computer.

In our first case study, the developed techniques are applied to the “Gradient”

Conclusions and future studies

In this paper we have shown how to improve the efficiency of the k-NN classification by incorporating two novel ideas. The first idea is the reduction of the template size using the concept of attractive capacity. The second idea is a preprocessing method to preclude participation of a large portion of prototype patterns which are unlikely to match the test pattern. This work notably speeds up the classification without compromising accuracy.

The proposed template reduction technique is distinct

Acknowledgements

The authors would like to thank the anonymous referees for their numerous comments which improved and clarified the presentation a lot.

About the Author—YINGQUAN WU received the B.S. and M.S. degrees in Mathematics from the Harbin Institute of Technology, Harbin, P. R. China, in 1995 and 1997, respectively. He received the M.S. degree in the Department of Electrical Engineering, State University of New York at Buffalo, USA, in 2000. Since 2000, he has been pursuing a Ph.D. in the Department of Electrical & Computer Engineering at the University of Illinois at Urbana-Champaign, USA.

References (14)

  • K. Hattori et al.

    A new nearest-neighbor rule in the pattern classification problem

    Pattern Recognition

    (1999)
  • S.O. Belkasim et al.

    Pattern classification using an efficient KNNR

    Pattern Recognition

    (1992)
  • T.M. Cover et al.

    Nearest neighbor pattern classification

    IEEE Trans. Inform. Theory

    (1967)
  • R.O. Duda et al.

    Pattern Classification and Scene Analysis

    (1973)
  • K. Fukunaga et al.

    k-nearest-neighbor Bayes-risk estimation

    IEEE Trans. Inform. Theory

    (1975)
  • J.M. Keller et al.

    A fuzzy k-nearest neighbor algorithm

    IEEE Trans. Systems Man Cybernet.

    (1985)
  • S.A. Dudani, The Distance-Weighted k-Nearest Neighbor Rule, Neighbor Neighbor Norms: NN Pattern Classification...
There are more references available in the full text version of this article.

Cited by (159)

  • Using machine learning regression models to predict the pellet quality of pelleted feeds

    2022, Animal Feed Science and Technology
    Citation Excerpt :

    The test point will be labeled using the most common label (majority voting) of the nearest k neighbors around it. The distance between two data points can be measured by metrics such as the Euclidean distance, the Hamming distance, the Manhattan distance, and the more general Minkowski distance (Wu et al., 2002). Four ensemble learning algorithms (RF, ABR, GBR, SR) that combine multiple base learners (Sagi and Rokach, 2018) were considered in this study.

View all citing articles on Scopus

About the Author—YINGQUAN WU received the B.S. and M.S. degrees in Mathematics from the Harbin Institute of Technology, Harbin, P. R. China, in 1995 and 1997, respectively. He received the M.S. degree in the Department of Electrical Engineering, State University of New York at Buffalo, USA, in 2000. Since 2000, he has been pursuing a Ph.D. in the Department of Electrical & Computer Engineering at the University of Illinois at Urbana-Champaign, USA.

About the Author—KRASSIMIR IANAKIEV received a Master (Hons.) degree from Sofia University in 1989 and a Ph.D. degree in Mathematics and Computer Science and Engineering in 1998 and 2000, respectively. His research interests include pattern recognition and fuzzy systems.

About the Author—VENU GOVINDARAJU received his Ph.D. in Computer Science from the State University of New York at Buffalo in 1992 and Bachelors of Technology from the Indian Institute of Technology, Kharagpur, in 1986. Venu has co-authored a total of over 115 technical papers (26 in journals) and has one US patent on cursive script recognition. His main areas of interest are human computer interaction and pattern recognition. He is currently the associate director of the Center of Excellence for Document Analysis and Recognition (CEDAR) and concurrently holds the associate professorship in the Department of Computer Science and Engineering, University at Buffalo. He is the associate editor of the Journal of Pattern Recognition and the IEEE Transaction on Pattern Analysis and Machine Intelligence. Venu Govindaraju is the Program Co-chair of the upcoming Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR) in 2002. He is a senior member of the IEEE.

View full text