Abstract
A k nearest neighbor (kNN) classifier classifies a query instance to the most frequent class of its k nearest neighbors in the training instance space. For imbalanced class distribution, a query instance is often overwhelmed by majority class instances in its neighborhood and likely to be classified to the majority class. We propose to identify exemplar minority class training instances and generalize them to Gaussian balls as concepts for the minority class. Our k Exemplar-based Nearest Neighbor (kENN) classifier is therefore more sensitive to the minority class. Extensive experiments show that kENN significantly improves the performance of kNN and also outperforms popular re-sampling and cost-sensitive learning strategies for imbalanced classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D.W. (ed.): Lazy learning. Kluwer Academic Publishers, Dordrecht (1997)
Aha, D.W., et al.: Instance-based learning algorithms. Machine Learning 6 (1991)
Bosch, A., et al.: When small disjuncts abound, try lazy learning: A case study. In: BDCML (1997)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30 (1997)
Chawla, N.V., et al.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002)
Cover, T., Hart, P.: Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory 13 (1967)
Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: KDD 1999 (1999)
Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI (2001)
Fawcett, T., Provost, F.J.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1(3) (1997)
Holte, R.C., et al.: Concept learning and the problem of small disjuncts. In: IJCAI 1989 (1989)
Kubat, M., et al.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30(2-3) (1998)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: ICML 1997 (1997)
Ling, C., et al.: Data mining for direct marketing: Problems and solutions. In: KDD 1998 (1998)
Menzies, T., et al.: Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33 (2007)
Provost, F., et al.: The case against accuracy estimation for comparing induction algorithms. In: ICML 1998 (1998)
Provost, F.J., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3) (2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Swets, J.: Measuring the accuracy of diagnostic systems. Science 240(4857) (1988)
Ting, K.: The problem of small disjuncts: its remedy in decision trees. In: Canadian Conference on Artificial Intelligence (1994)
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explorations 6(1) (2004)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning (2000)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Y., Zhang, X. (2011). Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-20847-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20846-1
Online ISBN: 978-3-642-20847-8
eBook Packages: Computer ScienceComputer Science (R0)