A Simple Noise-Tolerant Abstraction Algorithm for Fast k-NN Classification

Ougiaroglou, Stefanos; Evangelidis, Georgios

doi:10.1007/978-3-642-28931-6_20

Stefanos Ougiaroglou²⁵ &
Georgios Evangelidis²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7209))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

1761 Accesses
1 Citations

Abstract

The k-Nearest Neighbor (k-NN) classifier is a widely-used and effective classification method. The main k-NN drawback is that it involves high computational cost when applied on large datasets. Many Data Reduction Techniques have been proposed in order to speed-up the classification process. However, their effectiveness depends on the level of noise in the data. This paper shows that the k-means clustering algorithm can be used as a noise-tolerant Data Reduction Technique. The conducted experimental study illustrates that if the reduced dataset includes the k-means centroids as representatives of the initial data, performance is not negatively affected as much by the addition of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic and Soft Computing 17(2-3), 255–287 (2011)
Google Scholar
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesús, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)
Article Google Scholar
Alizadeh, H., Minaei-Bidgoli, B., Amirgholipour, S.K.: A new method for improving the performance of k nearest neighbor using clustering technique. JCIT 4(2), 84–92 (2009)
Article Google Scholar
Angiulli, F.: Fast condensed nearest neighbor rule. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 25–32. ACM, New York (2005)
Chapter Google Scholar
Chen, C.H., Jóźwik, A.: A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recogn. Lett. 17, 819–823 (1996)
Article Google Scholar
Dasarathy, B.V.: Nearest neighbor (NN) norms : NN pattern classification techniques. IEEE Computer Society Press (1991)
Google Scholar
Datta, P., Kibler, D.: Learning symbolic prototypes. In: Proceedings of the Fourteenth ICML, pp. 158–166. Morgan Kaufmann (1997)
Google Scholar
Devi, V.S., Murty, M.N.: An incremental prototype set building technique. Pattern Recognition 35 (2002)
Google Scholar
Eick, C.F., Zeidat, N.M., Vilalta, R.: Using representative-based clustering for nearest neighbor dataset editing. In: ICDM, pp. 375–378 (2004)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(prePrints) (2011)
Google Scholar
Gates, G.W.: The reduced nearest neighbor rule. IEEE Transactions on Information Theory 18(3), 431–433 (1972)
Article Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science (2011)
Google Scholar
Hart, P.E.: The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14(3), 515–516 (1968)
Article Google Scholar
Hruschka, E.R., Hruschka, E.R.J., Ebecken, N.F.: Towards efficient imputation by nearest-neighbors: A clustering-based approach. In: Australian Conference on Artificial Intelligence, pp. 513–525 (2004)
Google Scholar
Hwang, S., Cho, S.: Clustering-Based Reference Set Reduction for k-nearest Neighbor. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007, Part I. LNCS, vol. 4492, pp. 880–888. Springer, Heidelberg (2007)
Chapter Google Scholar
Karamitopoulos, L., Evangelidis, G.: Cluster-based similarity search in time series. In: Proceedings of the 2009 Fourth Balkan Conference in Informatics, BCI 2009, pp. 113–118. IEEE Computer Society, Washington, DC, USA (2009)
Chapter Google Scholar
Lozano, M.: Data Reduction Techniques in Classification processes (Phd Thesis). Universitat Jaume I (2007)
Google Scholar
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of 5th Berkeley Symp. on Math. Statistics and Probability, pp. 281–298. University of California Press, Berkeley (1967)
Google Scholar
Olvera-Lopez, J.A., Carrasco-Ochoa, J.A., Trinidad, J.F.M.: A new fast prototype selection method based on clustering. Pattern Anal. Appl. 13(2), 131–141 (2010)
Article MathSciNet Google Scholar
Ritter, G., Woodruff, H., Lowry, S., Isenhour, T.: An algorithm for a selective nearest neighbor decision rule. IEEE Trans. on Inf. Theory 21(6), 665–669 (1975)
Article MATH Google Scholar
Samet, H.: Foundations of multidimensional and metric data structures. The Morgan Kaufmann series in computer graphics. Elsevier,Morgan Kaufmann (2006)
Google Scholar
Sánchez, J.S.: High training set size reduction by space partitioning and prototype abstraction. Pattern Recognition 37(7), 1561–1564 (2004)
Article Google Scholar
Toussaint, G.: Proximity graphs for nearest neighbor decision rules: Recent progress. In: 34th Symposium on the INTERFACE, pp. 17–20 (2002)
Google Scholar
Triguero, I., Derrac, J., García, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C 42(1), 86–100 (2012)
Article Google Scholar
Wang, X.: A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 1293–1299 (August 2011)
Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38(3), 257–286 (2000)
Article MATH Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. on Systems, Man, and Cybernetics 2(3), 408–421 (1972)
Article MATH Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, vol. 32. Springer, Heidelberg (2006)
MATH Google Scholar
Zhang, B., Srihari, S.N.: Fast k-nearest neighbor classification using cluster-based trees. IEEE Trans. Pattern Anal. Mach. Intell. 26(4), 525–528 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Applied Informatics, University of Macedonia, 54006, Thessaloniki, Greece
Stefanos Ougiaroglou & Georgios Evangelidis

Authors

Stefanos Ougiaroglou
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Evangelidis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad de Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Emilio Corchado
VŠB-TU Ostrava 17, Listopadu 15, 70833, Ostrava, Czech Republic
Václav Snášel
Machine Intelligence Research Labs Machine Intelligence Research Labs(MIR Labs),, Scientific Network for Innovation and Research Excellence, P.O. Box 2259, 98071, Auburn, Washington, USA
Ajith Abraham
Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Michał Woźniak
University of the Basque Country, Pº Manuel Lardizabal 1, 20018, San Sebastian, Spain
Manuel Graña
Yonsei University, 134 Shinchon-dong, 120-749, Sudaemoon-ku, Seoul, Korea
Sung-Bae Cho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ougiaroglou, S., Evangelidis, G. (2012). A Simple Noise-Tolerant Abstraction Algorithm for Fast k-NN Classification. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28931-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-28931-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28930-9
Online ISBN: 978-3-642-28931-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics