Improved k-nearest neighbor classification
Introduction
The k-nearest neighbor (k-NN) rule [1], [2], [3], [4], [5], [6], [7], [8] is a well-known decision rule widely used in pattern classification applications. The misclassification rate of the k-NN rule approaches the optimal Bayes error rate asymptotically as k increases [3] and is particularly effective when the probability distributions of the feature variables are not known, thereby rendering the Bayes decision rule [3] ineffective. The computational inefficiency of the k-NN rule stems from the following observation. To perform template1 matching, the complexity of each matching is O(n), where n is the dimension of the feature space. In order to achieve a high recognition rate, the feature dimension n and the template size M are chosen to be large. For example, consider the GSC recognizer which uses features based on gradient, structural, and concavity aspects of a character image [8] and uses the k-NN rule to achieve high classification accuracy. It has a feature dimension of 512 and template size of 32,000 [8] making it quite inefficient to match a test pattern against the entire set of prototypes. In this paper we propose two effective techniques to improve the efficiency: template condensing and preprocessing.
Template condensing is an important part of the nearest neighbor (1-NN) rule. The set of prototypes in the template are chosen so that classification obtained using any proper subset of the initial template leads to a gradual degradation in recognition accuracy. This greatly decreases the number of prototypes that an unknown pattern must be compared to with sacrifice of accuracy [9], [10], [11], [12], [13]. In this paper, we develop a novel method of selecting the subset of prototypes for general k-NN classification. The idea is motivated by the observation that, if a large number of prototypes form a homogeneous cluster in feature space, then the number of prototypes in the neighborhood of the test pattern is usually larger than k (sufficient number according to the k-NN rule) when the test pattern is located in this area. This observation is further strengthened by the fact that k is usually quite small in real applications in order to keep the process of searching the nearest k prototypes efficient. Our idea is to “sparsify” dense homogeneous clusters by iteratively eliminating patterns which exhibit high “attractive capacities” (defined in Section 3). This not only reduces the template size significantly but also maintains the level of classification accuracy. In this sense the method presented in this paper differs from those described in [9], [10], [11], [12], [13].
We also describe a preprocessing operation wherein an unknown pattern is matched against a prototype in two sequential stages. In the first stage a quick assessment of the potential of match is made. The approach is motivated by an insightful observation that the norm of a pattern vector represents a characteristic of the pattern. In order for a full match to occur in the second stage, the difference of the norms of the prototype and the test pattern must be less than a predetermined threshold. The threshold is designed for each prototype individually. Prototypes that fail in the first stage of matching are not considered any further. A large portion of the prototypes are thus dynamically precluded. This preprocessing just takes one step, i.e., the complexity is O(1) and is independent of the dimensionality of the feature space. Furthermore, such preprocessing does not sacrifice the accuracy for it only rejects prototypes which are not “close” to the test pattern in feature space, if properly applied.
The rest of the paper is organized as follows. In Section 2, we introduce the general k-NN classification. In 3 Template condensing, 4 Preprocessing, we present template condensing and preprocessing respectively. We present experimental results in Section 5, and draw conclusions in Section 6.
Section snippets
Preliminary: k-NN classification
Let p be the number of classes, and be the set of class labels. Let be a set of labeled patterns referred to as a template. A labeled pattern in the template is referred to as a prototype, where n denotes the pattern dimension. denotes the weight of a prototype , i.e., the number of prototypes in the template. The class label of a prototype is denoted by .
Let be the matching measure between pattern and , where is supposed to be a non-negative
Template condensing
In the k-NN classification the (k+1),(k+2),…, prototypes in the template nearest to an unknown do not affect the classification of . In fact, k is usually chosen to be a small number, otherwise sorting k nearest patterns over a template of size M, after all matching measures are calculated, will need computational complexity O(kM/p) [14]. Often the number of prototypes (all of a single class) which are nearer to than prototypes of other classes is much larger than k (which gives the
Preprocessing
In the previous section we have introduced a method to reduce the template size while maintaining nearly the original accuracy. In this section, we further enhance the efficiency of the k-NN algorithm. Our idea is to reject a large part of the template prototypes dynamically by carrying out computationally efficient preprocessing.
We observe that the norm of a prototype, ||·||, is a special characteristic of that prototype when appropriately defined (usually l1 or l2 norm). An unknown pattern
Experimental results
In this section we describe the application of the two techniques described in 3 Template condensing, 4 Preprocessing to handwritten numeral recognition where the number of classes is 10 (p=10). The training set of 126,000 patterns has an equal number of patterns in each class. The testing set has 25,300 patterns and again equal number in each class. The experimental platform is the SPARC computer.
In our first case study, the developed techniques are applied to the “Gradient”
Conclusions and future studies
In this paper we have shown how to improve the efficiency of the k-NN classification by incorporating two novel ideas. The first idea is the reduction of the template size using the concept of attractive capacity. The second idea is a preprocessing method to preclude participation of a large portion of prototype patterns which are unlikely to match the test pattern. This work notably speeds up the classification without compromising accuracy.
The proposed template reduction technique is distinct
Acknowledgements
The authors would like to thank the anonymous referees for their numerous comments which improved and clarified the presentation a lot.
About the Author—YINGQUAN WU received the B.S. and M.S. degrees in Mathematics from the Harbin Institute of Technology, Harbin, P. R. China, in 1995 and 1997, respectively. He received the M.S. degree in the Department of Electrical Engineering, State University of New York at Buffalo, USA, in 2000. Since 2000, he has been pursuing a Ph.D. in the Department of Electrical & Computer Engineering at the University of Illinois at Urbana-Champaign, USA.
References (14)
- et al.
A new nearest-neighbor rule in the pattern classification problem
Pattern Recognition
(1999) - et al.
Pattern classification using an efficient KNNR
Pattern Recognition
(1992) - et al.
Nearest neighbor pattern classification
IEEE Trans. Inform. Theory
(1967) - et al.
Pattern Classification and Scene Analysis
(1973) - et al.
k-nearest-neighbor Bayes-risk estimation
IEEE Trans. Inform. Theory
(1975) - et al.
A fuzzy k-nearest neighbor algorithm
IEEE Trans. Systems Man Cybernet.
(1985) - S.A. Dudani, The Distance-Weighted k-Nearest Neighbor Rule, Neighbor Neighbor Norms: NN Pattern Classification...
Cited by (159)
A stochastic approximation approach to fixed instance selection
2023, Information SciencesUsing machine learning regression models to predict the pellet quality of pelleted feeds
2022, Animal Feed Science and TechnologyCitation Excerpt :The test point will be labeled using the most common label (majority voting) of the nearest k neighbors around it. The distance between two data points can be measured by metrics such as the Euclidean distance, the Hamming distance, the Manhattan distance, and the more general Minkowski distance (Wu et al., 2002). Four ensemble learning algorithms (RF, ABR, GBR, SR) that combine multiple base learners (Sagi and Rokach, 2018) were considered in this study.
About the Author—YINGQUAN WU received the B.S. and M.S. degrees in Mathematics from the Harbin Institute of Technology, Harbin, P. R. China, in 1995 and 1997, respectively. He received the M.S. degree in the Department of Electrical Engineering, State University of New York at Buffalo, USA, in 2000. Since 2000, he has been pursuing a Ph.D. in the Department of Electrical & Computer Engineering at the University of Illinois at Urbana-Champaign, USA.
About the Author—KRASSIMIR IANAKIEV received a Master (Hons.) degree from Sofia University in 1989 and a Ph.D. degree in Mathematics and Computer Science and Engineering in 1998 and 2000, respectively. His research interests include pattern recognition and fuzzy systems.
About the Author—VENU GOVINDARAJU received his Ph.D. in Computer Science from the State University of New York at Buffalo in 1992 and Bachelors of Technology from the Indian Institute of Technology, Kharagpur, in 1986. Venu has co-authored a total of over 115 technical papers (26 in journals) and has one US patent on cursive script recognition. His main areas of interest are human computer interaction and pattern recognition. He is currently the associate director of the Center of Excellence for Document Analysis and Recognition (CEDAR) and concurrently holds the associate professorship in the Department of Computer Science and Engineering, University at Buffalo. He is the associate editor of the Journal of Pattern Recognition and the IEEE Transaction on Pattern Analysis and Machine Intelligence. Venu Govindaraju is the Program Co-chair of the upcoming Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR) in 2002. He is a senior member of the IEEE.