Elsevier

Knowledge-Based Systems

Volume 26, February 2012, Pages 103-110
Knowledge-Based Systems

A novel two-level nearest neighbor classification algorithm using an adaptive distance metric

https://doi.org/10.1016/j.knosys.2011.07.010Get rights and content

Abstract

When there exist an infinite number of samples in the training set, the outcome from nearest neighbor classification (kNN) is independent on its adopted distance metric. However, it is impossible that the number of training samples is infinite. Therefore, selecting distance metric becomes crucial in determining the performance of kNN. We propose a novel two-level nearest neighbor algorithm (TLNN) in order to minimize the mean-absolute error of the misclassification rate of kNN with finite and infinite number of training samples. At the low-level, we use Euclidean distance to determine a local subspace centered at an unlabeled test sample. At the high-level, AdaBoost is used as guidance for local information extraction. Data invariance is maintained by TLNN and the highly stretched or elongated neighborhoods along different directions are produced. The TLNN algorithm can reduce the excessive dependence on the statistical method which learns prior knowledge from the training data. Even the linear combination of a few base classifiers produced by the weak learner in AdaBoost can yield much better kNN classifiers. The experiments on both synthetic and real world data sets provide justifications for our proposed method.

Introduction

The nearest neighbor (kNN) rule is one of the oldest and most accurate methods to obtain nonlinear decision boundaries in classification problems [1], [2], [3]. Given a training data set, the kNN predicts the class label of an unlabeled test sample by the majority label among its k-nearest neighbors in the training set. It has been shown that the 1 nearest neighbor rule has asymptotic error rate that is at most twice the Bayes error rate, independent on the used distance metric.

When there are an infinite number of training samples in the training set, the class conditional probabilities are constant in some infinitesimal region around an unlabeled test sample. Thus, Euclidean distance is the natural choice as the distance metric to identify the nearest neighbors because Euclidean distance implies that the input space is isotropic or homogeneous. In the absence of prior knowledge, most kNN based methods use simple Euclidean distances to measure the similarities between samples. However, it is impossible that the number of training samples is infinite in the training set. Therefore, the assumption of locally constant class conditional probabilities is invalid with limited samples, and that is, Euclidean distance fails to capitalize on any statistical regularity in the training data. Hence, selecting a distance measure becomes crucial in determining the outcome of nearest neighbor classification.

Distance metric learning has played a significant role in both statistical classification and information retrieval. For instance, previous studies [4], [5] have shown that appropriate distance metrics can significantly improve the classification accuracy of kNN algorithm. Most of the work in distance metrics learning can be organized into the following two categories:

• The metric is learned based on discriminant analysis [4], [5], [6], [7], [8], [9], [10], [11], [12].

The majority of the previous works in this area attempt to determine a location in input space where nearest neighbors of an unlabeled test sample always belong to the same class while samples from different classes are separated. However, the nearest neighbors of an unlabeled test sample should be chosen so that class conditional probabilities are equal to or approximately equal to the unlabeled test sample, which is not taken into account by the previous studies. In addition, Yang et al. [8] show that the goals of keeping data points within the same classes close together while separating data points from different classes conflict and cannot be simultaneously satisfied when classes exhibit multimodal data distributions. Graepel and Herbrich [10] show that most of previous works in this area cannot incorporate data invariance to known transformations which has been shown to improve the accuracy of a classifier. For example, Weinberger and Saul [5] learn a Mahalanobis distance metric for kNN classification so that the k-nearest neighbors always belong to the same class while samples from different classes are separated by a large margin (LMNN). However, LMNN does not incorporate data invariance to known transformations. Domeniconi and Gunopulos [6] learn a local flexible metric technique using support vector machines (LFM-SVM), where SVM is used as guidance to define a local flexible metric. However, when the kernel-based methods transform data in the input space to a high-dimensional feature space, data invariance cannot be maintained. Furthermore, it is difficult to choose a proper kernel function to perform this task. Finally, the majority of previous works in this area rely heavily on the discriminant analysis which learns prior knowledge from training samples. It is not surprising that poor performance of SVM in some cases, such as under-fitting or over-fitting of SVM, will result in a significant degradation in classification accuracy for LFM-SVM classifier.

• The metric is learned based on local statistical analysis.

Popular algorithms in this category include [13], [14], [15], [16], [17]. Zhang et al. [13] present an algorithm which learns a distance metric from a training data set by knowledge embedding. Using the new distance metric can determine the nearest neighbors of an unlabeled test sample in the input space. Their coordinates are close and local features are similar. A kNN algorithm based on the best distance measurement (kNN-BDM) is given in [14], [15]. The best distance measurement is optimized in order to minimize the mean-square error of misclassification rate of kNN with finite and infinite number of training samples. However, these methods depend on the local statistical information and lack global discriminant analysis. This makes the predictions of kNN highly sensitive to the noise or the sampling fluctuations associated with the random nature of the process producing the training data and leads to high variance predictions.

In this paper, we proposed a novel two-level nearest neighbor algorithm (TLNN). We firstly use AdaBoost as guidance to define a local flexible metric. AdaBoost has been successfully used as a classification tool in a variety of areas, and it is able to take advantage of weak hypotheses to maximize the minimum margin even if the training error of the combination of hypotheses is zero [18], [19], [20]. There are theoretical bounds on the generalization error of linear classifiers [18], [21], [22], which inspired AdaBoost convey desirable properties to the AdaBoost based learning algorithm, and therefore AdaBoost is the natural choice to guide the local information extraction. At the low-level of the TLNN, we use Euclidean distance to determine a local subspace centered at an unlabeled test sample. Then at the high-level of the TLNN, AdaBoost is chosen to guide the local information extraction in the subspace centered at the unlabeled test sample.

The TLNN algorithm has several advantages. Firstly, data invariance is maintained and at the same time the highly stretched or elongated neighborhoods along different directions are produced. Secondly, the TLNN algorithm reduces the excessive dependence on the statistical methods which learn prior knowledge from labeled or unlabeled data. Even the linear combinations of a few base classifiers produced by the weak learner in AdaBoost can yield much better kNN classifiers. Thirdly, with the improvement of performance of AdaBoost, the TLNN can minimize the mean-absolute error of misclassification rate of kNN with finite and infinite number of training samples. Compared to other methods, our classification model is non-parametric.

Section snippets

Distance metric learning through minimizing the mean-absolute error

Let x be a query point whose class label we want to predict, x be the closest point to x. The probability of misclassifying x by the nearest neighbor classification method is below [15]:PN(e|x)=P(w1|x)P(w2|x)+P(w2|x)P(w1|x)=P(w1|x)[1-P(w1|x)]+P(w2|x)P(w1|x)=2P(w1|x)P(w2|x)+[P(w1|x)-P(w2|x)]·[P(w1|x)-P(w1|x)],where N denotes the number of training samples, and w1 and w2 represent the class label. For simplicity, only a binary classification task is considered herein. When N, the

A novel two-level nearest neighbor classification algorithm (TLNN)

A nearest neighbor classifier using AdaBoost is presented in the previous section in order to minimize the mean-absolute error of misclassification rate of kNN with finite and infinite number of training samples. But this method relies heavily on classifier f(x). The solution provided by AdaBoost produces highly stretched neighborhoods along boundary directions and guides the local information extraction in such areas. Thus the nearest neighbor classifier using AdaBoost generally has poor

Experiments

In this section we compare several classification methods using real data sets and data sets with artificially controlled label noise. For the experiments presented in this paper, we use a selection of two-class data sets from the UC Irvine database [23]. Instances with missing attribute values are deleted from the selected data sets. These data sets are listed in Table 1.

In all the experiments, the features are first normalized over the training data to have zero mean and unit variance, and

Conclusion

In this paper, a two-level nearest neighbor algorithm (TLNN) is proposed to show how to guide local information extraction for kNN classification using AdaBoost algorithm. Since AdaBoost is able to take advantage of weak hypotheses to maximize the minimum margin, the TLNN algorithm ensures that the nearest neighbors always belong to the same class with a same or similar class conditional probability while samples from different classes are separated by a large margin. The double-layer design

Acknowledgements

This project was funded by funds from the National Natural Science Foundation of China (No. 61174161), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20090121110022), the Fundamental Research Funds for the Central Universities of Xiamen University (Nos. 2011121047, 201112G018 and CXB2011035), the Key Research Project of Fujian Province of China (No. 2009H0044) and Xiamen University National 211 3rd Period Project of China (No. 0630-E72000).

References (24)

  • K.Q. Weinberger et al.

    Distance metric learning for large margin nearest neighbor classification

    Journal of Machine Learning Research

    (2009)
  • C. Domeniconi et al.

    Adaptive nearest neighbor classification using support vector machines

    NIPS: Advances in Neural Information Processing Systems

    (2001)
  • Cited by (20)

    • An improved location difference of multiple distances based nearest neighbors searching algorithm

      2016, Optik
      Citation Excerpt :

      kNN algorithms are used to find k nearest neighbors of data points in a dataset. These algorithms are used in many fields including feature selection [1], pattern recognition [2,3], clustering [4], classification noise detection [5], and classification [6–10]. The basic method of finding k nearest neighbors of a point is to compute all Euclidean distances from the query point to all other data points.

    • A novel and powerful hybrid classifier method: Development and testing of heuristic k-nn algorithm with fuzzy distance metric

      2016, Data and Knowledge Engineering
      Citation Excerpt :

      In other words, weighing the observation features has a significant impact on the similarity measurements and the k-nn-based classification, but it is not enough for the classification of the new observations in a perfect form. Additionally, the results of the similarity measurements or the selection of the distance metrics is an important factor in k-nn based classification process [4,15,17]. In other words, an effective distance metrics or similarity measurement method and a weight-tuning method should be developed and/or combined with the k-nn algorithm to produce perfect classification results.

    • Location difference of multiple distances based k-nearest neighbors algorithm

      2015, Knowledge-Based Systems
      Citation Excerpt :

      kNN algorithms are used to find k-nearest neighbors of data points in a dataset. These algorithms are used in many fields including feature selection [1], pattern recognition [2–3], clustering [4], classification noise detection [5], and classification [6–10]. The basic method of finding k-nearest neighbors of a point is to compute all Euclidean distances from the query point to all other data points.

    • Optimized face recognition algorithm using radial basis function neural networks and its practical applications

      2015, Neural Networks
      Citation Excerpt :

      The purpose of face detection is to locate faces present in still images. This has long been a focus of computer vision research and has achieved a great deal of success (Gao, Pan, Ji, & Yang, 2012; Lopez-Molina, Baets, Bustince, Sanz, & Barrenechea, 2013; Rowley, Baluja, & Kanade, 1998; Sung & Poggio, 1998; Viola & Jones, 2004). Comprehensive reviews are given by Yang, Kriegman, and Ahuja (2002), and Zhao, Chellappa, and Phillips (2003).

    View all citing articles on Scopus
    View full text