Abstract
The k-NN classifier is one of the most known and widely used nonparametric classifiers. The k-NN rule is optimal in the asymptotic case which means that its classification error aims for Bayes error if the number of the training samples approaches infinity. A lot of alternative extensions of the traditional k-NN have been developed to improve the classification accuracy. However, it is also well-known fact that when the number of the samples grows it can become very inefficient because we have to compute all the distances from the testing sample to every sample from the training data set. In this paper, a simple method which addresses this issue is proposed. Combining k-NN classifier with the centroid neighbor classifier improves the speed of the algorithm without changing the results of the original k-NN. In fact usage confusion matrices and excluding outliers makes the resulting algorithm much faster and robust.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altincay, H.: Improving the k-nearest neighbour rule: using geometrical neighbourhoods and manifold-based metrics. Expert Syst. Wiley Online Libr. 28(4), 391–406 (2011)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chmielnicki, W., Sta̧por, K.: Protein fold recognition with combined SVM-RDA classifier. In: Graña Romay, M., Corchado, E., Garcia Sebastian, M.T. (eds.) HAIS 2010, Part I. LNCS (LNAI), vol. 6076, pp. 162–169. Springer, Heidelberg (2010)
Chmielnicki, W., Stapor, K.: Investigation of normalization techniques and their impact on a recognition rate in handwritten numeral recognition. Schedae Informaticae 19, 53–77 (2010)
Chmielnicki, W., Stapor, K.: A hybrid discriminative/generative approach to protein fold recognition. Neurocomputing 75(1), 194–198 (2012)
Domeniconi, C., Gunopulos, D., Peng, J.: Large margin nearest neighbor classifiers. IEEE Trans. Neural Netw. 16(4), 899–909 (2005)
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice Hall, Englewood Cliffs (1982)
Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)
Dubchak, I., Muchnik, I., Kim, S.H.: Protein folding class predictor for SCOP: approach based on global descriptors. In: Proceedings ISMB (1997)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2000)
Dudani, S.: The distance-weighted k-nearest neighbor rule. IEEE Trans. Syst. Man Cybern. 6(4), 325–327 (1976)
Gou, J., Xiong, T., Kuang, Y.: A novel weighted voting for k-nearest neighbor rule. J. Comput. 6(5), 833–840 (2011)
Gou, J., Du, L., Xiong, T.: Weighted k-nearest centroid neighbor classification. J. Comput. Inf. Syst. 8(2), 851–860 (2012)
Jiang, Y., Zhou, Z.-H.: Editing training data for kNN classifiers with neural network ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004)
LeCun, Y., et al.: Comparison of learning algorithms for handwritten digit recognition. In: Fogelman-Soulie, F., Gallinari, P. (eds.) Proceedings of the International Conference on Artificial Neural Networks, France, Nanterre, pp. 53–60 (1995)
Liu, Y., Yang, Y., Carbonell, J.: Boosting to correct inductive bias in text classification. In: 11th ACM International Conference on Information and Knowledge Management, pp. 348–355 (2002)
The MNIST database of handwritten digits (2015). http://yann.lecun.com/exdb/mnist/
Petitjean, F., Forestier, G., Webb, G., Nicholson, A., Chen, Y., Keogh, E.: Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowl. Inf. Syst. 1–26 (2015). doi:10.1007/s10115-015-0878-8
Sanchez, J.S., Marques, A.: An LVQ-based adaptive algorithm for learning from very small codebooks. Neurocomputing 69(7–9), 922–927 (2006)
Shalev-Shwartz, S., Singer, Y., Ng, A.: Online and batch learning of pseudo-metrics. In: Proceedings of the 21st International Conference on Machine Learning, Banff, Canada (2004)
Silva, P.F.B., Marçal, A.R.S., Almeida da Silva, R.: Evaluation of features for leaf discrimination. In: Kamel, M., Campilho, A. (eds.) ICIAR 2013. LNCS, vol. 7950, pp. 197–204. Springer, Heidelberg (2013)
Sun, S., Huang, R.: An adaptive k-nearest neighbor algorithm. In: Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 91–94 (2010)
Sun, S.: Local within-class accuracies for weighting individual outputs in multiple classifier systems. Pattern Recogn. Lett. 31(2), 119–124 (2010)
Tan, S.: An improved centroid classifier for text categorization. Expert Syst. Appl. 35(1), 279–285 (2008)
UCI Machine Learning Repository (2015). http://archive.ics.uci.edu/ml/datasets.html
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Weinberger, K., Saul, L.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
Xi, X., Ueno, K., Keogh, E., Lee, D.: Converting nonparametric distance-based classification to anytime algorithms. Pattern Anal. Appl. 11(3–4), 321–336 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chmielnicki, W. (2016). Combining k-Nearest Neighbor and Centroid Neighbor Classifier for Fast and Robust Classification. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2016. Lecture Notes in Computer Science(), vol 9648. Springer, Cham. https://doi.org/10.1007/978-3-319-32034-2_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-32034-2_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32033-5
Online ISBN: 978-3-319-32034-2
eBook Packages: Computer ScienceComputer Science (R0)