A novel two-level nearest neighbor classification algorithm using an adaptive distance metric

doi:10.1016/j.knosys.2011.07.010

Knowledge-Based Systems

Volume 26, February 2012, Pages 103-110

https://doi.org/10.1016/j.knosys.2011.07.010 Get rights and content

Abstract

When there exist an infinite number of samples in the training set, the outcome from nearest neighbor classification (kNN) is independent on its adopted distance metric. However, it is impossible that the number of training samples is infinite. Therefore, selecting distance metric becomes crucial in determining the performance of kNN. We propose a novel two-level nearest neighbor algorithm (TLNN) in order to minimize the mean-absolute error of the misclassification rate of kNN with finite and infinite number of training samples. At the low-level, we use Euclidean distance to determine a local subspace centered at an unlabeled test sample. At the high-level, AdaBoost is used as guidance for local information extraction. Data invariance is maintained by TLNN and the highly stretched or elongated neighborhoods along different directions are produced. The TLNN algorithm can reduce the excessive dependence on the statistical method which learns prior knowledge from the training data. Even the linear combination of a few base classifiers produced by the weak learner in AdaBoost can yield much better kNN classifiers. The experiments on both synthetic and real world data sets provide justifications for our proposed method.

Introduction

The nearest neighbor (kNN) rule is one of the oldest and most accurate methods to obtain nonlinear decision boundaries in classification problems [1], [2], [3]. Given a training data set, the kNN predicts the class label of an unlabeled test sample by the majority label among its k-nearest neighbors in the training set. It has been shown that the 1 nearest neighbor rule has asymptotic error rate that is at most twice the Bayes error rate, independent on the used distance metric.

When there are an infinite number of training samples in the training set, the class conditional probabilities are constant in some infinitesimal region around an unlabeled test sample. Thus, Euclidean distance is the natural choice as the distance metric to identify the nearest neighbors because Euclidean distance implies that the input space is isotropic or homogeneous. In the absence of prior knowledge, most kNN based methods use simple Euclidean distances to measure the similarities between samples. However, it is impossible that the number of training samples is infinite in the training set. Therefore, the assumption of locally constant class conditional probabilities is invalid with limited samples, and that is, Euclidean distance fails to capitalize on any statistical regularity in the training data. Hence, selecting a distance measure becomes crucial in determining the outcome of nearest neighbor classification.

Distance metric learning has played a significant role in both statistical classification and information retrieval. For instance, previous studies [4], [5] have shown that appropriate distance metrics can significantly improve the classification accuracy of kNN algorithm. Most of the work in distance metrics learning can be organized into the following two categories:

• The metric is learned based on discriminant analysis [4], [5], [6], [7], [8], [9], [10], [11], [12].

The majority of the previous works in this area attempt to determine a location in input space where nearest neighbors of an unlabeled test sample always belong to the same class while samples from different classes are separated. However, the nearest neighbors of an unlabeled test sample should be chosen so that class conditional probabilities are equal to or approximately equal to the unlabeled test sample, which is not taken into account by the previous studies. In addition, Yang et al. [8] show that the goals of keeping data points within the same classes close together while separating data points from different classes conflict and cannot be simultaneously satisfied when classes exhibit multimodal data distributions. Graepel and Herbrich [10] show that most of previous works in this area cannot incorporate data invariance to known transformations which has been shown to improve the accuracy of a classifier. For example, Weinberger and Saul [5] learn a Mahalanobis distance metric for kNN classification so that the k-nearest neighbors always belong to the same class while samples from different classes are separated by a large margin (LMNN). However, LMNN does not incorporate data invariance to known transformations. Domeniconi and Gunopulos [6] learn a local flexible metric technique using support vector machines (LFM-SVM), where SVM is used as guidance to define a local flexible metric. However, when the kernel-based methods transform data in the input space to a high-dimensional feature space, data invariance cannot be maintained. Furthermore, it is difficult to choose a proper kernel function to perform this task. Finally, the majority of previous works in this area rely heavily on the discriminant analysis which learns prior knowledge from training samples. It is not surprising that poor performance of SVM in some cases, such as under-fitting or over-fitting of SVM, will result in a significant degradation in classification accuracy for LFM-SVM classifier.

• The metric is learned based on local statistical analysis.

Popular algorithms in this category include [13], [14], [15], [16], [17]. Zhang et al. [13] present an algorithm which learns a distance metric from a training data set by knowledge embedding. Using the new distance metric can determine the nearest neighbors of an unlabeled test sample in the input space. Their coordinates are close and local features are similar. A kNN algorithm based on the best distance measurement (kNN-BDM) is given in [14], [15]. The best distance measurement is optimized in order to minimize the mean-square error of misclassification rate of kNN with finite and infinite number of training samples. However, these methods depend on the local statistical information and lack global discriminant analysis. This makes the predictions of kNN highly sensitive to the noise or the sampling fluctuations associated with the random nature of the process producing the training data and leads to high variance predictions.

In this paper, we proposed a novel two-level nearest neighbor algorithm (TLNN). We firstly use AdaBoost as guidance to define a local flexible metric. AdaBoost has been successfully used as a classification tool in a variety of areas, and it is able to take advantage of weak hypotheses to maximize the minimum margin even if the training error of the combination of hypotheses is zero [18], [19], [20]. There are theoretical bounds on the generalization error of linear classifiers [18], [21], [22], which inspired AdaBoost convey desirable properties to the AdaBoost based learning algorithm, and therefore AdaBoost is the natural choice to guide the local information extraction. At the low-level of the TLNN, we use Euclidean distance to determine a local subspace centered at an unlabeled test sample. Then at the high-level of the TLNN, AdaBoost is chosen to guide the local information extraction in the subspace centered at the unlabeled test sample.

The TLNN algorithm has several advantages. Firstly, data invariance is maintained and at the same time the highly stretched or elongated neighborhoods along different directions are produced. Secondly, the TLNN algorithm reduces the excessive dependence on the statistical methods which learn prior knowledge from labeled or unlabeled data. Even the linear combinations of a few base classifiers produced by the weak learner in AdaBoost can yield much better kNN classifiers. Thirdly, with the improvement of performance of AdaBoost, the TLNN can minimize the mean-absolute error of misclassification rate of kNN with finite and infinite number of training samples. Compared to other methods, our classification model is non-parametric.

Section snippets

Distance metric learning through minimizing the mean-absolute error

Let x be a query point whose class label we want to predict, $x^{'}$ be the closest point to x. The probability of misclassifying x by the nearest neighbor classification method is below [15]: $P_{N} (e | x) = P (w_{1} | x) P (w_{2} | x^{'}) + P (w_{2} | x) P (w_{1} | x^{'}) = P (w_{1} | x) [1 - P (w_{1} | x^{'})] + P (w_{2} | x) P (w_{1} | x^{'}) = 2 P (w_{1} | x) P (w_{2} | x) + [P (w_{1} | x) - P (w_{2} | x)] \cdot [P (w_{1} | x) - P (w_{1} | x^{'})],$ where N denotes the number of training samples, and w₁ and w₂ represent the class label. For simplicity, only a binary classification task is considered herein. When $N \to \infty$ , the

A novel two-level nearest neighbor classification algorithm (TLNN)

A nearest neighbor classifier using AdaBoost is presented in the previous section in order to minimize the mean-absolute error of misclassification rate of kNN with finite and infinite number of training samples. But this method relies heavily on classifier f(x). The solution provided by AdaBoost produces highly stretched neighborhoods along boundary directions and guides the local information extraction in such areas. Thus the nearest neighbor classifier using AdaBoost generally has poor

Experiments

In this section we compare several classification methods using real data sets and data sets with artificially controlled label noise. For the experiments presented in this paper, we use a selection of two-class data sets from the UC Irvine database [23]. Instances with missing attribute values are deleted from the selected data sets. These data sets are listed in Table 1.

In all the experiments, the features are first normalized over the training data to have zero mean and unit variance, and

Conclusion

In this paper, a two-level nearest neighbor algorithm (TLNN) is proposed to show how to guide local information extraction for kNN classification using AdaBoost algorithm. Since AdaBoost is able to take advantage of weak hypotheses to maximize the minimum margin, the TLNN algorithm ensures that the nearest neighbors always belong to the same class with a same or similar class conditional probability while samples from different classes are separated by a large margin. The double-layer design

Acknowledgements

This project was funded by funds from the National Natural Science Foundation of China (No. 61174161), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20090121110022), the Fundamental Research Funds for the Central Universities of Xiamen University (Nos. 2011121047, 201112G018 and CXB2011035), the Key Research Project of Fujian Province of China (No. 2009H0044) and Xiamen University National 211 3rd Period Project of China (No. 0630-E72000).

References (24)

J. Hu et al.
Learning a locality discriminating projection for classification
Knowledge-Based Systems
(2009)
Y.G. Zhang et al.
Distance metric learning by knowledge embedding
Pattern Recognition
(2004)
P.J. García-Laencina et al.
K nearest neighbors with mutual information for simultaneous classification and missing data imputation
Neurocomputing
(2009)
M. Laguıa et al.
Local distance-based classification
Knowledge-Based Systems
(2008)
Y.L. Gao et al.
Edited AdaBoost by weighted kNN
Neurocomputing
(2010)
R. Li et al.
Dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation
Knowledge-Based Systems
(2010)
T. Hastie et al.
Discriminant adaptive nearest neighbor classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1996)
D. Lowe
Similarity metric learning for a variable-kernel classifier
Neural Computation
(1995)
H.A. Güvenir et al.
Regression on feature projections
Knowledge-Based Systems
(2000)
J. Goldberger et al.
Neighborhood components analysis
NIPS: Advances in Neural Information Processing Systems
(2004)

K.Q. Weinberger et al.

Distance metric learning for large margin nearest neighbor classification

Journal of Machine Learning Research

(2009)

C. Domeniconi et al.

Adaptive nearest neighbor classification using support vector machines

NIPS: Advances in Neural Information Processing Systems

(2001)

Cited by (20)

A new general nearest neighbor classification based on the mutual neighborhood information
2017, Knowledge-Based Systems
The nearest neighbor (NN) rule is effective for many applications in pattern classification, such as the famous k-nearest neighbor (kNN) classifier. However, NN-based classifiers perform a one-sided classification by finding the nearest neighbors simply according to the neighborhood of the testing sample. In this paper, we propose a new selection method of nearest neighbors based on a two-sided mode, called general nearest neighbor (GNN) rule. The mutual neighborhood information of both testing sample and training sample is considered, then the overlapping of the above neighborhoods is used to decide the general nearest neighbors of the testing sample. To verify the effectiveness of the GNN rule in pattern classification, a k-general nearest neighbor (kGNN) classifier is proposed by applying the k-neighborhood information of each sample to find the general nearest neighbors. Extensive experiments on twenty real-world datasets from UCI and KEEL repository and two Gaussian artificial datasets of the I-I and Ness dataset prove that the kGNN classifier outperforms the kNN classifier and seven other state-of-the-art NN-based classifiers, particularly in the situations of small training sample size.
An improved location difference of multiple distances based nearest neighbors searching algorithm
2016, Optik
Citation Excerpt :
kNN algorithms are used to find k nearest neighbors of data points in a dataset. These algorithms are used in many fields including feature selection [1], pattern recognition [2,3], clustering [4], classification noise detection [5], and classification [6–10]. The basic method of finding k nearest neighbors of a point is to compute all Euclidean distances from the query point to all other data points.
The location difference of multiple distances based nearest neighbors search algorithm (LDMDBA) has a good performance in efficiency compared with other kNN algorithm. The major advantage of it is its precision is litter lower than the full search algorithm (FSA) algorithm. In this paper, we proposed an improved LDMDBA algorithm (ILDMDBA) by increasing the number of the reference points from log(d) to d, where the d is the dimensionality of data set. By this way, the prediction of ILDMDBA is improved. Our analysis results show that the time complexity of the proposed algorithm is not increased. The effectiveness and efficiency of the proposed algorithm are demonstrated in experiments involving public and artificial datasets.
A novel and powerful hybrid classifier method: Development and testing of heuristic k-nn algorithm with fuzzy distance metric
2016, Data and Knowledge Engineering
Citation Excerpt :
In other words, weighing the observation features has a significant impact on the similarity measurements and the k-nn-based classification, but it is not enough for the classification of the new observations in a perfect form. Additionally, the results of the similarity measurements or the selection of the distance metrics is an important factor in k-nn based classification process [4,15,17]. In other words, an effective distance metrics or similarity measurement method and a weight-tuning method should be developed and/or combined with the k-nn algorithm to produce perfect classification results.
Weight-tuning methods and distance metrics have a significant impact on the k-nearest neighbor-based classification. A major challenge is the issue of how to explore the optimal weight values of the features and how to measure distances between the neighbors affecting the classification accuracy of the k-nn. In this paper, a powerful similarity measurement method, which is called the fuzzy distance metric, is explained and extended to measure the distances between the test and training observations. Depending on the fuzzy metric, similarity arrays can be produced more efficiently than the classic and other weighted distance measurements. Finally, the weighting methods are combined with the fuzzy metric-based similarity measurement and the k-nearest neighbor algorithm to increase the classification accuracy of the proposed algorithm. The effectiveness of the proposed approaches is proven by comparing their performances with the performances of the classic and the population-based heuristic methods on the well-known, real-world classification problems obtained from the UCI machine-learning benchmark repository. The experimental results show that the proposed hybrid algorithms significantly explore more optimal weight vectors significantly and provide more accurate classification results than the powerful and well-known instance-based intuitive and heuristic classification algorithms and classic approaches over real datasets.
Location difference of multiple distances based k-nearest neighbors algorithm
2015, Knowledge-Based Systems
Citation Excerpt :
kNN algorithms are used to find k-nearest neighbors of data points in a dataset. These algorithms are used in many fields including feature selection [1], pattern recognition [2–3], clustering [4], classification noise detection [5], and classification [6–10]. The basic method of finding k-nearest neighbors of a point is to compute all Euclidean distances from the query point to all other data points.
k-nearest neighbors (kNN) classifiers are commonly used in various applications due to their relative simplicity and the absence of necessary training. However, the time complexity of the basic algorithm is quadratic, which makes them inappropriate for large scale datasets. At the same time, the performance of most improved algorithms based on tree structures decreases rapidly with increase in dimensionality of dataset, and tree structures have different complexity in different datasets. In this paper, we introduce the concept of “location difference of multiple distances, and use it to measure the difference between different data points. In this way, location difference of multiple distances based nearest neighbors searching algorithm (LDMDBA) is proposed. LDMDBA has a time complexity of O(logdnlogn) and does not rely on a search tree. This makes LDMDBA the only kNN method that can be efficiently applied to high dimensional data and has very good stability on different datasets. In addition, most of the existing methods have a time complexity of O(n) to predict a data point outside the dataset. By contrast, LDMDBA has a time complexity of O(logdlogn) to predict a query point in datasets of different dimensions, and, therefore, can be applied in real systems and large scale databases. The effectiveness and efficiency of LDMDBA are demonstrated in experiments involving public and artificial datasets.
Optimized face recognition algorithm using radial basis function neural networks and its practical applications
2015, Neural Networks
Citation Excerpt :
The purpose of face detection is to locate faces present in still images. This has long been a focus of computer vision research and has achieved a great deal of success (Gao, Pan, Ji, & Yang, 2012; Lopez-Molina, Baets, Bustince, Sanz, & Barrenechea, 2013; Rowley, Baluja, & Kanade, 1998; Sung & Poggio, 1998; Viola & Jones, 2004). Comprehensive reviews are given by Yang, Kriegman, and Ahuja (2002), and Zhao, Chellappa, and Phillips (2003).
In this study, we propose a hybrid method of face recognition by using face region information extracted from the detected face region. In the preprocessing part, we develop a hybrid approach based on the Active Shape Model (ASM) and the Principal Component Analysis (PCA) algorithm. At this step, we use a CCD (Charge Coupled Device) camera to acquire a facial image by using AdaBoost and then Histogram Equalization (HE) is employed to improve the quality of the image. ASM extracts the face contour and image shape to produce a personal profile. Then we use a PCA method to reduce dimensionality of face images. In the recognition part, we consider the improved Radial Basis Function Neural Networks (RBF NNs) to identify a unique pattern associated with each person. The proposed RBF NN architecture consists of three functional modules realizing the condition phase, the conclusion phase, and the inference phase completed with the help of fuzzy rules coming in the standard ‘if-then’ format. In the formation of the condition part of the fuzzy rules, the input space is partitioned with the use of Fuzzy C-Means (FCM) clustering. In the conclusion part of the fuzzy rules, the connections (weights) of the RBF NNs are represented by four kinds of polynomials such as constant, linear, quadratic, and reduced quadratic. The values of the coefficients are determined by running a gradient descent method. The output of the RBF NNs model is obtained by running a fuzzy inference method. The essential design parameters of the network (including learning rate, momentum coefficient and fuzzification coefficient used by the FCM) are optimized by means of Differential Evolution (DE). The proposed P-RBF NNs (Polynomial based RBF NNs) are applied to facial recognition and its performance is quantified from the viewpoint of the output performance and recognition rate.
A framework for strategy formulation based on clustering approach: A case study in a corporate organization
2013, Knowledge-Based Systems
Recent corporate organizations are significantly more complicated than ever. They are more distributed and networked, as supply chains, virtual organizations and corporate arrangements. By increasing the complexity of the decision making in dynamic competitive environment, managers need relevant strategic plans for their firms. In this paper, a new framework for strategic formulation based on clustering approach has been proposed to cope with these intricacies. After exploring internal and external factors influencing the goals of the organizational departments, the goal–factor matrices are formed based on their correlations. A clustering approach is applied to integrate goal–factor matrices to fulfill incorporate interactions among departments. Strategies would be formulated for clusters instead of departments individually or as organization totally. In fact, management by objective (MBO) has been substituted by management by cluster (MBC). The capability and usefulness of the proposed framework are shown through a case study in National Iranian Oil Company Training Center. Results indicate that the proposed strategic formulation outperforms other approaches and is very promising not only for solving the organization’s problem, but also is appropriate for utilizing in other corporate organizations.

View all citing articles on Scopus

View full text

A novel two-level nearest neighbor classification algorithm using an adaptive distance metric

Abstract

Introduction

Section snippets

Distance metric learning through minimizing the mean-absolute error

A novel two-level nearest neighbor classification algorithm (TLNN)

Experiments

Conclusion

Acknowledgements

Knowledge-Based Systems

Pattern Recognition

Neurocomputing

Knowledge-Based Systems

Neurocomputing

Knowledge-Based Systems

Discriminant adaptive nearest neighbor classification

IEEE Transactions on Pattern Analysis and Machine Intelligence

Similarity metric learning for a variable-kernel classifier

Neural Computation

Regression on feature projections

Knowledge-Based Systems

Neighborhood components analysis

NIPS: Advances in Neural Information Processing Systems

Distance metric learning for large margin nearest neighbor classification

Journal of Machine Learning Research

Adaptive nearest neighbor classification using support vector machines

NIPS: Advances in Neural Information Processing Systems