A novel two-level nearest neighbor classification algorithm using an adaptive distance metric
Introduction
The nearest neighbor (kNN) rule is one of the oldest and most accurate methods to obtain nonlinear decision boundaries in classification problems [1], [2], [3]. Given a training data set, the kNN predicts the class label of an unlabeled test sample by the majority label among its k-nearest neighbors in the training set. It has been shown that the 1 nearest neighbor rule has asymptotic error rate that is at most twice the Bayes error rate, independent on the used distance metric.
When there are an infinite number of training samples in the training set, the class conditional probabilities are constant in some infinitesimal region around an unlabeled test sample. Thus, Euclidean distance is the natural choice as the distance metric to identify the nearest neighbors because Euclidean distance implies that the input space is isotropic or homogeneous. In the absence of prior knowledge, most kNN based methods use simple Euclidean distances to measure the similarities between samples. However, it is impossible that the number of training samples is infinite in the training set. Therefore, the assumption of locally constant class conditional probabilities is invalid with limited samples, and that is, Euclidean distance fails to capitalize on any statistical regularity in the training data. Hence, selecting a distance measure becomes crucial in determining the outcome of nearest neighbor classification.
Distance metric learning has played a significant role in both statistical classification and information retrieval. For instance, previous studies [4], [5] have shown that appropriate distance metrics can significantly improve the classification accuracy of kNN algorithm. Most of the work in distance metrics learning can be organized into the following two categories:
• The metric is learned based on discriminant analysis [4], [5], [6], [7], [8], [9], [10], [11], [12].
The majority of the previous works in this area attempt to determine a location in input space where nearest neighbors of an unlabeled test sample always belong to the same class while samples from different classes are separated. However, the nearest neighbors of an unlabeled test sample should be chosen so that class conditional probabilities are equal to or approximately equal to the unlabeled test sample, which is not taken into account by the previous studies. In addition, Yang et al. [8] show that the goals of keeping data points within the same classes close together while separating data points from different classes conflict and cannot be simultaneously satisfied when classes exhibit multimodal data distributions. Graepel and Herbrich [10] show that most of previous works in this area cannot incorporate data invariance to known transformations which has been shown to improve the accuracy of a classifier. For example, Weinberger and Saul [5] learn a Mahalanobis distance metric for kNN classification so that the k-nearest neighbors always belong to the same class while samples from different classes are separated by a large margin (LMNN). However, LMNN does not incorporate data invariance to known transformations. Domeniconi and Gunopulos [6] learn a local flexible metric technique using support vector machines (LFM-SVM), where SVM is used as guidance to define a local flexible metric. However, when the kernel-based methods transform data in the input space to a high-dimensional feature space, data invariance cannot be maintained. Furthermore, it is difficult to choose a proper kernel function to perform this task. Finally, the majority of previous works in this area rely heavily on the discriminant analysis which learns prior knowledge from training samples. It is not surprising that poor performance of SVM in some cases, such as under-fitting or over-fitting of SVM, will result in a significant degradation in classification accuracy for LFM-SVM classifier.
• The metric is learned based on local statistical analysis.
Popular algorithms in this category include [13], [14], [15], [16], [17]. Zhang et al. [13] present an algorithm which learns a distance metric from a training data set by knowledge embedding. Using the new distance metric can determine the nearest neighbors of an unlabeled test sample in the input space. Their coordinates are close and local features are similar. A kNN algorithm based on the best distance measurement (kNN-BDM) is given in [14], [15]. The best distance measurement is optimized in order to minimize the mean-square error of misclassification rate of kNN with finite and infinite number of training samples. However, these methods depend on the local statistical information and lack global discriminant analysis. This makes the predictions of kNN highly sensitive to the noise or the sampling fluctuations associated with the random nature of the process producing the training data and leads to high variance predictions.
In this paper, we proposed a novel two-level nearest neighbor algorithm (TLNN). We firstly use AdaBoost as guidance to define a local flexible metric. AdaBoost has been successfully used as a classification tool in a variety of areas, and it is able to take advantage of weak hypotheses to maximize the minimum margin even if the training error of the combination of hypotheses is zero [18], [19], [20]. There are theoretical bounds on the generalization error of linear classifiers [18], [21], [22], which inspired AdaBoost convey desirable properties to the AdaBoost based learning algorithm, and therefore AdaBoost is the natural choice to guide the local information extraction. At the low-level of the TLNN, we use Euclidean distance to determine a local subspace centered at an unlabeled test sample. Then at the high-level of the TLNN, AdaBoost is chosen to guide the local information extraction in the subspace centered at the unlabeled test sample.
The TLNN algorithm has several advantages. Firstly, data invariance is maintained and at the same time the highly stretched or elongated neighborhoods along different directions are produced. Secondly, the TLNN algorithm reduces the excessive dependence on the statistical methods which learn prior knowledge from labeled or unlabeled data. Even the linear combinations of a few base classifiers produced by the weak learner in AdaBoost can yield much better kNN classifiers. Thirdly, with the improvement of performance of AdaBoost, the TLNN can minimize the mean-absolute error of misclassification rate of kNN with finite and infinite number of training samples. Compared to other methods, our classification model is non-parametric.
Section snippets
Distance metric learning through minimizing the mean-absolute error
Let x be a query point whose class label we want to predict, be the closest point to x. The probability of misclassifying x by the nearest neighbor classification method is below [15]:where N denotes the number of training samples, and w1 and w2 represent the class label. For simplicity, only a binary classification task is considered herein. When , the
A novel two-level nearest neighbor classification algorithm (TLNN)
A nearest neighbor classifier using AdaBoost is presented in the previous section in order to minimize the mean-absolute error of misclassification rate of kNN with finite and infinite number of training samples. But this method relies heavily on classifier f(x). The solution provided by AdaBoost produces highly stretched neighborhoods along boundary directions and guides the local information extraction in such areas. Thus the nearest neighbor classifier using AdaBoost generally has poor
Experiments
In this section we compare several classification methods using real data sets and data sets with artificially controlled label noise. For the experiments presented in this paper, we use a selection of two-class data sets from the UC Irvine database [23]. Instances with missing attribute values are deleted from the selected data sets. These data sets are listed in Table 1.
In all the experiments, the features are first normalized over the training data to have zero mean and unit variance, and
Conclusion
In this paper, a two-level nearest neighbor algorithm (TLNN) is proposed to show how to guide local information extraction for kNN classification using AdaBoost algorithm. Since AdaBoost is able to take advantage of weak hypotheses to maximize the minimum margin, the TLNN algorithm ensures that the nearest neighbors always belong to the same class with a same or similar class conditional probability while samples from different classes are separated by a large margin. The double-layer design
Acknowledgements
This project was funded by funds from the National Natural Science Foundation of China (No. 61174161), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20090121110022), the Fundamental Research Funds for the Central Universities of Xiamen University (Nos. 2011121047, 201112G018 and CXB2011035), the Key Research Project of Fujian Province of China (No. 2009H0044) and Xiamen University National 211 3rd Period Project of China (No. 0630-E72000).
References (24)
- et al.
Learning a locality discriminating projection for classification
Knowledge-Based Systems
(2009) - et al.
Distance metric learning by knowledge embedding
Pattern Recognition
(2004) - et al.
K nearest neighbors with mutual information for simultaneous classification and missing data imputation
Neurocomputing
(2009) - et al.
Local distance-based classification
Knowledge-Based Systems
(2008) - et al.
Edited AdaBoost by weighted kNN
Neurocomputing
(2010) - et al.
Dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation
Knowledge-Based Systems
(2010) - et al.
Discriminant adaptive nearest neighbor classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1996) Similarity metric learning for a variable-kernel classifier
Neural Computation
(1995)- et al.
Regression on feature projections
Knowledge-Based Systems
(2000) - et al.
Neighborhood components analysis
NIPS: Advances in Neural Information Processing Systems
(2004)
Distance metric learning for large margin nearest neighbor classification
Journal of Machine Learning Research
Adaptive nearest neighbor classification using support vector machines
NIPS: Advances in Neural Information Processing Systems
Cited by (20)
A new general nearest neighbor classification based on the mutual neighborhood information
2017, Knowledge-Based SystemsAn improved location difference of multiple distances based nearest neighbors searching algorithm
2016, OptikCitation Excerpt :kNN algorithms are used to find k nearest neighbors of data points in a dataset. These algorithms are used in many fields including feature selection [1], pattern recognition [2,3], clustering [4], classification noise detection [5], and classification [6–10]. The basic method of finding k nearest neighbors of a point is to compute all Euclidean distances from the query point to all other data points.
A novel and powerful hybrid classifier method: Development and testing of heuristic k-nn algorithm with fuzzy distance metric
2016, Data and Knowledge EngineeringCitation Excerpt :In other words, weighing the observation features has a significant impact on the similarity measurements and the k-nn-based classification, but it is not enough for the classification of the new observations in a perfect form. Additionally, the results of the similarity measurements or the selection of the distance metrics is an important factor in k-nn based classification process [4,15,17]. In other words, an effective distance metrics or similarity measurement method and a weight-tuning method should be developed and/or combined with the k-nn algorithm to produce perfect classification results.
Location difference of multiple distances based k-nearest neighbors algorithm
2015, Knowledge-Based SystemsCitation Excerpt :kNN algorithms are used to find k-nearest neighbors of data points in a dataset. These algorithms are used in many fields including feature selection [1], pattern recognition [2–3], clustering [4], classification noise detection [5], and classification [6–10]. The basic method of finding k-nearest neighbors of a point is to compute all Euclidean distances from the query point to all other data points.
Optimized face recognition algorithm using radial basis function neural networks and its practical applications
2015, Neural NetworksCitation Excerpt :The purpose of face detection is to locate faces present in still images. This has long been a focus of computer vision research and has achieved a great deal of success (Gao, Pan, Ji, & Yang, 2012; Lopez-Molina, Baets, Bustince, Sanz, & Barrenechea, 2013; Rowley, Baluja, & Kanade, 1998; Sung & Poggio, 1998; Viola & Jones, 2004). Comprehensive reviews are given by Yang, Kriegman, and Ahuja (2002), and Zhao, Chellappa, and Phillips (2003).
A framework for strategy formulation based on clustering approach: A case study in a corporate organization
2013, Knowledge-Based Systems