Summary
The k-Nearest-Neighbors (kNN) is a simple but effective method for classification. The major drawbacks with respect to kNN are (1) low efficiency and (2) dependence on the parameter k. In this paper, we propose a novel similarity-based data reduction method and several variations aimed at overcoming these shortcomings. Our method constructs a similarity-based model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied in terms of local data distribution, and is optimal in terms of classification accuracy. The construction of the model significantly reduces the number of data for learning, thus making classification faster. Experiments conducted on some public data sets show that the proposed methods compare well with other data reduction methods in both efficiency and effectiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha DW, Kibler k, Albert MK (1991) Instance-Based Learning Algorithms, Machine Learning, 6, pp.37–66.
Aha DW (1992) Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms, International Journal of Man-Machine Studies, 36, pp. 267–287.
Cameron-Jones, RM (1995) Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing, Proc. of the 8th Australian Joint Conference on Artificial Intelligence, pp. 99–106.
Devijver P, Kittler J (1972) Pattern Recognition: A Statistical Approach, Prentice-Hall, Englewood Cliffs, NJ.
Gates G (1972) The Reduced Nearest Neighbor Rule, IEEE Transactions on Information Theory, 18, pp. 431–433.
Hand D, Mannila H, Smyth P (2001) Principles of Data Mining, The MIT Press.
Hart P (1968) The Condensed Nearest Neighbor Rule, IEEE Transactions on Information Theory, 14,515–516.
Riter GL, Woodruff HB, Lowry SR et al (1975) An Algorithm for a Selective Nearest Neighbor Decision Rule. IEEE Transactions on Information Theory, 21–6, November, pp. 665–669.
Sebastiani F (2002) Machine Learning in Automated Text Categorization, In ACM Computing Surveys, Vol. 34, No. 1, pp. 1–47.
Stanfill C, Waltz D (1986) Toward Memory-Based Reasoning Communications of the ACM, 29, pp. 1213–1228.
Tomek A (1976) An Experiment with the Edited Nearest-Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics, 6-6, pp. 448–452.
Wang H (2003) Contextual Probability, in Journal of Telecommunications and Information Technology, 4(3):92–97.
Wilson DL (1972) Asymptotic Properties of Nearest Neighbor Rules Using Edited Data, IEEE Transactions on Systems, Man, and Cybernetics, 2–3, pp. 408–421.
Wilson DR, Martinez TR (1997) Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research (JAIR), 6-1, pp. 1–34.
Wilson DR, Martinez TR (2000)Reduction Techniques for Instance-Based Learning Algorithms, Machine Learning, 38-3, pp. 257–286.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, G., Wang, H., Bell, D., Liao, Z. (2005). Similarity-Based Data Reduction and Classification. In: Monitoring, Security, and Rescue Techniques in Multiagent Systems. Advances in Soft Computing, vol 28. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32370-8_16
Download citation
DOI: https://doi.org/10.1007/3-540-32370-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23245-2
Online ISBN: 978-3-540-32370-9
eBook Packages: EngineeringEngineering (R0)