Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification

Li, Yuxuan; Zhang, Xiuzhen

doi:10.1007/978-3-642-20847-8_27

Yuxuan Li²² &
Xiuzhen Zhang²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6635))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2569 Accesses
26 Citations

Abstract

A k nearest neighbor (kNN) classifier classifies a query instance to the most frequent class of its k nearest neighbors in the training instance space. For imbalanced class distribution, a query instance is often overwhelmed by majority class instances in its neighborhood and likely to be classified to the majority class. We propose to identify exemplar minority class training instances and generalize them to Gaussian balls as concepts for the minority class. Our k Exemplar-based Nearest Neighbor (kENN) classifier is therefore more sensitive to the minority class. Extensive experiments show that kENN significantly improves the performance of kNN and also outperforms popular re-sampling and cost-sensitive learning strategies for imbalanced classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D.W. (ed.): Lazy learning. Kluwer Academic Publishers, Dordrecht (1997)
MATH Google Scholar
Aha, D.W., et al.: Instance-based learning algorithms. Machine Learning 6 (1991)
Google Scholar
Bosch, A., et al.: When small disjuncts abound, try lazy learning: A case study. In: BDCML (1997)
Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30 (1997)
Google Scholar
Chawla, N.V., et al.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002)
Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory 13 (1967)
Google Scholar
Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: KDD 1999 (1999)
Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI (2001)
Google Scholar
Fawcett, T., Provost, F.J.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1(3) (1997)
Google Scholar
Holte, R.C., et al.: Concept learning and the problem of small disjuncts. In: IJCAI 1989 (1989)
Google Scholar
Kubat, M., et al.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30(2-3) (1998)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: ICML 1997 (1997)
Google Scholar
Ling, C., et al.: Data mining for direct marketing: Problems and solutions. In: KDD 1998 (1998)
Google Scholar
Menzies, T., et al.: Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33 (2007)
Google Scholar
Provost, F., et al.: The case against accuracy estimation for comparing induction algorithms. In: ICML 1998 (1998)
Google Scholar
Provost, F.J., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3) (2001)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Swets, J.: Measuring the accuracy of diagnostic systems. Science 240(4857) (1988)
Google Scholar
Ting, K.: The problem of small disjuncts: its remedy in decision trees. In: Canadian Conference on Artificial Intelligence (1994)
Google Scholar
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explorations 6(1) (2004)
Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning (2000)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Technology, RMIT University, Melbourne, Australia
Yuxuan Li & Xiuzhen Zhang

Authors

Yuxuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiuzhen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences, 518055, Shenzhen, China
Joshua Zhexue Huang
Faculty of Engineering and Information Technology, Center for Quantum Computation and Intelligent Systems, Data Sciences and Knowledge Discovery Lab, University of Technology Sydney, 2007, Sydney, NSW, Australia
Longbing Cao
Department of Computer Science and Engineering, University of Minnesota, 55455, Minneapolis, MN, USA
Jaideep Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Zhang, X. (2011). Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-20847-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20846-1
Online ISBN: 978-3-642-20847-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics