Abstract
The k-Nearest Neighbor (k-NN) classifier is a widely-used and effective classification method. The main k-NN drawback is that it involves high computational cost when applied on large datasets. Many Data Reduction Techniques have been proposed in order to speed-up the classification process. However, their effectiveness depends on the level of noise in the data. This paper shows that the k-means clustering algorithm can be used as a noise-tolerant Data Reduction Technique. The conducted experimental study illustrates that if the reduced dataset includes the k-means centroids as representatives of the initial data, performance is not negatively affected as much by the addition of noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., GarcÃa, S.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic and Soft Computing 17(2-3), 255–287 (2011)
Alcalá-Fdez, J., Sánchez, L., GarcÃa, S., del Jesús, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)
Alizadeh, H., Minaei-Bidgoli, B., Amirgholipour, S.K.: A new method for improving the performance of k nearest neighbor using clustering technique. JCIT 4(2), 84–92 (2009)
Angiulli, F.: Fast condensed nearest neighbor rule. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 25–32. ACM, New York (2005)
Chen, C.H., Jóźwik, A.: A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recogn. Lett. 17, 819–823 (1996)
Dasarathy, B.V.: Nearest neighbor (NN) norms : NN pattern classification techniques. IEEE Computer Society Press (1991)
Datta, P., Kibler, D.: Learning symbolic prototypes. In: Proceedings of the Fourteenth ICML, pp. 158–166. Morgan Kaufmann (1997)
Devi, V.S., Murty, M.N.: An incremental prototype set building technique. Pattern Recognition 35 (2002)
Eick, C.F., Zeidat, N.M., Vilalta, R.: Using representative-based clustering for nearest neighbor dataset editing. In: ICDM, pp. 375–378 (2004)
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(prePrints) (2011)
Gates, G.W.: The reduced nearest neighbor rule. IEEE Transactions on Information Theory 18(3), 431–433 (1972)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science (2011)
Hart, P.E.: The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14(3), 515–516 (1968)
Hruschka, E.R., Hruschka, E.R.J., Ebecken, N.F.: Towards efficient imputation by nearest-neighbors: A clustering-based approach. In: Australian Conference on Artificial Intelligence, pp. 513–525 (2004)
Hwang, S., Cho, S.: Clustering-Based Reference Set Reduction for k-nearest Neighbor. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007, Part I. LNCS, vol. 4492, pp. 880–888. Springer, Heidelberg (2007)
Karamitopoulos, L., Evangelidis, G.: Cluster-based similarity search in time series. In: Proceedings of the 2009 Fourth Balkan Conference in Informatics, BCI 2009, pp. 113–118. IEEE Computer Society, Washington, DC, USA (2009)
Lozano, M.: Data Reduction Techniques in Classification processes (Phd Thesis). Universitat Jaume I (2007)
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of 5th Berkeley Symp. on Math. Statistics and Probability, pp. 281–298. University of California Press, Berkeley (1967)
Olvera-Lopez, J.A., Carrasco-Ochoa, J.A., Trinidad, J.F.M.: A new fast prototype selection method based on clustering. Pattern Anal. Appl. 13(2), 131–141 (2010)
Ritter, G., Woodruff, H., Lowry, S., Isenhour, T.: An algorithm for a selective nearest neighbor decision rule. IEEE Trans. on Inf. Theory 21(6), 665–669 (1975)
Samet, H.: Foundations of multidimensional and metric data structures. The Morgan Kaufmann series in computer graphics. Elsevier,Morgan Kaufmann (2006)
Sánchez, J.S.: High training set size reduction by space partitioning and prototype abstraction. Pattern Recognition 37(7), 1561–1564 (2004)
Toussaint, G.: Proximity graphs for nearest neighbor decision rules: Recent progress. In: 34th Symposium on the INTERFACE, pp. 17–20 (2002)
Triguero, I., Derrac, J., GarcÃa, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C 42(1), 86–100 (2012)
Wang, X.: A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 1293–1299 (August 2011)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38(3), 257–286 (2000)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. on Systems, Man, and Cybernetics 2(3), 408–421 (1972)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, vol. 32. Springer, Heidelberg (2006)
Zhang, B., Srihari, S.N.: Fast k-nearest neighbor classification using cluster-based trees. IEEE Trans. Pattern Anal. Mach. Intell. 26(4), 525–528 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ougiaroglou, S., Evangelidis, G. (2012). A Simple Noise-Tolerant Abstraction Algorithm for Fast k-NN Classification. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28931-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-28931-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28930-9
Online ISBN: 978-3-642-28931-6
eBook Packages: Computer ScienceComputer Science (R0)