Similarity-Based Data Reduction and Classification

Guo, Gongde; Wang, Hui; Bell, David; Liao, Zhining

doi:10.1007/3-540-32370-8_16

Gongde Guo^6,8,
Hui Wang⁶,
David Bell⁷ &
…
Zhining Liao⁶

Part of the book series: Advances in Soft Computing ((AINSC,volume 28))

540 Accesses
2 Citations

Summary

The k-Nearest-Neighbors (kNN) is a simple but effective method for classification. The major drawbacks with respect to kNN are (1) low efficiency and (2) dependence on the parameter k. In this paper, we propose a novel similarity-based data reduction method and several variations aimed at overcoming these shortcomings. Our method constructs a similarity-based model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied in terms of local data distribution, and is optimal in terms of classification accuracy. The construction of the model significantly reduces the number of data for learning, thus making classification faster. Experiments conducted on some public data sets show that the proposed methods compare well with other data reduction methods in both efficiency and effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha DW, Kibler k, Albert MK (1991) Instance-Based Learning Algorithms, Machine Learning, 6, pp.37–66.
Google Scholar
Aha DW (1992) Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms, International Journal of Man-Machine Studies, 36, pp. 267–287.
Article Google Scholar
Cameron-Jones, RM (1995) Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing, Proc. of the 8th Australian Joint Conference on Artificial Intelligence, pp. 99–106.
Google Scholar
Devijver P, Kittler J (1972) Pattern Recognition: A Statistical Approach, Prentice-Hall, Englewood Cliffs, NJ.
Google Scholar
Gates G (1972) The Reduced Nearest Neighbor Rule, IEEE Transactions on Information Theory, 18, pp. 431–433.
Article Google Scholar
Hand D, Mannila H, Smyth P (2001) Principles of Data Mining, The MIT Press.
Google Scholar
Hart P (1968) The Condensed Nearest Neighbor Rule, IEEE Transactions on Information Theory, 14,515–516.
Article Google Scholar
Riter GL, Woodruff HB, Lowry SR et al (1975) An Algorithm for a Selective Nearest Neighbor Decision Rule. IEEE Transactions on Information Theory, 21–6, November, pp. 665–669.
Google Scholar
Sebastiani F (2002) Machine Learning in Automated Text Categorization, In ACM Computing Surveys, Vol. 34, No. 1, pp. 1–47.
Article MathSciNet Google Scholar
Stanfill C, Waltz D (1986) Toward Memory-Based Reasoning Communications of the ACM, 29, pp. 1213–1228.
Google Scholar
Tomek A (1976) An Experiment with the Edited Nearest-Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics, 6-6, pp. 448–452.
MathSciNet Google Scholar
Wang H (2003) Contextual Probability, in Journal of Telecommunications and Information Technology, 4(3):92–97.
Google Scholar
Wilson DL (1972) Asymptotic Properties of Nearest Neighbor Rules Using Edited Data, IEEE Transactions on Systems, Man, and Cybernetics, 2–3, pp. 408–421.
Article Google Scholar
Wilson DR, Martinez TR (1997) Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research (JAIR), 6-1, pp. 1–34.
MATH MathSciNet Google Scholar
Wilson DR, Martinez TR (2000)Reduction Techniques for Instance-Based Learning Algorithms, Machine Learning, 38-3, pp. 257–286.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, University of Ulster, BT37 0QB, UK
Gongde Guo, Hui Wang & Zhining Liao
School of Computer Science, Queen’s University Belfast, BT7 1NN, UK
David Bell
School of Computer and Information Science, Fujian University of Technology, Fuzhou, 350014, China
Gongde Guo

Authors

Gongde Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
David Bell
View author publications
You can also search for this author in PubMed Google Scholar
Zhining Liao
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, G., Wang, H., Bell, D., Liao, Z. (2005). Similarity-Based Data Reduction and Classification. In: Monitoring, Security, and Rescue Techniques in Multiagent Systems. Advances in Soft Computing, vol 28. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32370-8_16

Download citation

DOI: https://doi.org/10.1007/3-540-32370-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23245-2
Online ISBN: 978-3-540-32370-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics