Abstract
k-NN is a widely used supervised machine learning method in different domains. Despite its simplicity, effectiveness, and robustness, k-NN is limited to the use of the Euclidean distance as the similarity metric, the arbitrarily selected neighborhood size k, the computational challenge from high dimensional data, and the use of the simple majority voting rule. Among different variants of k-NN in classification, we sought to address the last issue and proposed the Centroid Displacement-based k-NN (CDNN), where centroid displacement is used for class determination. In this study, we present an implementation of CDNN for scikit-learn, a well-known machine learning library for the Python programming language, and a comprehensive comparative performance analysis of CDNN with different variants of k-NN in scikit-learn. We open-source our algorithm to benefit the users, and to the best of our knowledge, no similar studies on performance analysis of k-NN and its variants in scikit-learn have been done. We also examine the effectiveness of different distance metrics on the performance of CDNN on different datasets. Extensive experiments on real-world and synthetic datasets verify the effectiveness of CDNN compared to the standard k-NN and other state-of-the-art k-NN-based algorithms. The results from the distance metrics comparison study also show that other distance metrics can further improve the classification performance of CDNN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abu-Aisheh, Z., Raveaux, R., Ramel, J.Y.: Efficient k-nearest neighbors search in graph space. Pattern Recogn. Lett. 134, 77–86 (2020)
Abu Alfeilat, H.A., et al.: Effects of distance measure choice on k-nearest neighbor classifier performance: a review. Big Data 7(4), 221–248 (2019)
Bentley, J.L.: Survey of techniques for fixed radius near neighbor searching. Technical report, Stanford Linear Accelerator Center, Calif. (USA) (1975)
Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. City 1(2), 1 (2007)
Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC 6(4), 325–327 (1976)
Elhamifar, E., Vidal, R.: Sparse manifold clustering and embedding. Adv. Neural Inf. Process. Syst. 24 (2011)
Ertuğrul, Ö.F., Tağluk, M.E.: A novel version of k nearest neighbor: dependent nearest neighbor. Appl. Soft Comput. 55, 480–490 (2017)
Fix, E., Hodges, J.: Discriminatory analysis, nonparametric discrimination: consistency properties. Technical report 4, USAF School of Aviation Medicine, Randolph Field 1951 (1951)
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Kramer, O.: Scikit-learn. In: Machine Learning for Evolution Strategies. SBD, vol. 20, pp. 45–53. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-33383-0_5
Kumbure, M.M., Luukka, P., Collan, M.: A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recogn. Lett. 140, 172–178 (2020)
Lichman, M., et al.: UCI machine learning repository (2013)
Nguyen, B.P., Tay, W.L., Chui, C.K.: Robust biometric recognition from palm depth images for gloved hands. IEEE Trans. Hum.-Mach. Syst. 45(6), 799–804 (2015)
Pan, Z., Wang, Y., Pan, Y.: A new locally adaptive k-nearest neighbor algorithm based on discrimination class. Knowl.-Based Syst. 204, 106185 (2020)
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)
Raschka, S.: Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808 (2018)
Ruan, Y., Xiao, Y., Hao, Z., Liu, B.: A nearest-neighbor search model for distance metric learning. Inf. Sci. 552, 261–277 (2021)
Sengupta, S., Das, S.: Selective nearest neighbors clustering. Pattern Recogn. Lett. 155, 178–185 (2022)
Song, Y., Kong, X., Zhang, C.: A large-scale-nearest neighbor classification algorithm based on neighbor relationship preservation. Wireless Commun. Mob. Comput. 2022 (2022)
Sturm, B.L.: Classification accuracy is not enough. J. Intell. Inf. Syst. 41(3), 371–406 (2013)
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)
Todeschini, R., Ballabio, D., Consonni, V., Grisoni, F.: A new concept of higher-order similarity and the role of distance/similarity measures in local classification methods. Chemom. Intell. Lab. Syst. 157, 50–57 (2016)
Uddin, S., Haque, I., Lu, H., Moni, M.A., Gide, E.: Comparative performance analysis of k-nearest Neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep. 12(1), 1–11 (2022)
Varoquaux, G., Buitinck, L., Louppe, G., Grisel, O., Pedregosa, F., Mueller, A.: Scikit-learn: machine learning without learning the machinery. GetMobile: Mob. Comput. Commun. 19(1), 29–33 (2015)
Xie, Z., Hsu, W., Liu, Z., Lee, M.L.: SNNB: a selective neighborhood based naïve Bayes for lazy learning. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 104–114. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_10
Yang, L., Jin, R.: Distance metric learning: a comprehensive survey. Mich. State Universiy 2(2), 4 (2006)
Zhang, R.F., Urbanowicz, R.J.: A scikit-learn compatible learning classifier system. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp. 1816–1823 (2020)
Acknowledgments
The work of BPN was partly supported by a research programme funded by the New Zealand Ministry of Business Innovation and Employment (MBIE) under contract VUW RTVU1905 and the MBIE Strategic Science Investment Fund for Data Science under contract VUW RTVU1914.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, A.X., Chukova, S.S., Nguyen, B.P. (2022). Implementation and Analysis of Centroid Displacement-Based k-Nearest Neighbors. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13725. Springer, Cham. https://doi.org/10.1007/978-3-031-22064-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-22064-7_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22063-0
Online ISBN: 978-3-031-22064-7
eBook Packages: Computer ScienceComputer Science (R0)