Skip to main content

Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6635))

Included in the following conference series:

Abstract

A k nearest neighbor (kNN) classifier classifies a query instance to the most frequent class of its k nearest neighbors in the training instance space. For imbalanced class distribution, a query instance is often overwhelmed by majority class instances in its neighborhood and likely to be classified to the majority class. We propose to identify exemplar minority class training instances and generalize them to Gaussian balls as concepts for the minority class. Our k Exemplar-based Nearest Neighbor (kENN) classifier is therefore more sensitive to the minority class. Extensive experiments show that kENN significantly improves the performance of kNN and also outperforms popular re-sampling and cost-sensitive learning strategies for imbalanced classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D.W. (ed.): Lazy learning. Kluwer Academic Publishers, Dordrecht (1997)

    MATH  Google Scholar 

  2. Aha, D.W., et al.: Instance-based learning algorithms. Machine Learning 6 (1991)

    Google Scholar 

  3. Bosch, A., et al.: When small disjuncts abound, try lazy learning: A case study. In: BDCML (1997)

    Google Scholar 

  4. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30 (1997)

    Google Scholar 

  5. Chawla, N.V., et al.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002)

    Google Scholar 

  6. Cover, T., Hart, P.: Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory 13 (1967)

    Google Scholar 

  7. Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: KDD 1999 (1999)

    Google Scholar 

  8. Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI (2001)

    Google Scholar 

  9. Fawcett, T., Provost, F.J.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1(3) (1997)

    Google Scholar 

  10. Holte, R.C., et al.: Concept learning and the problem of small disjuncts. In: IJCAI 1989 (1989)

    Google Scholar 

  11. Kubat, M., et al.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30(2-3) (1998)

    Google Scholar 

  12. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: ICML 1997 (1997)

    Google Scholar 

  13. Ling, C., et al.: Data mining for direct marketing: Problems and solutions. In: KDD 1998 (1998)

    Google Scholar 

  14. Menzies, T., et al.: Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33 (2007)

    Google Scholar 

  15. Provost, F., et al.: The case against accuracy estimation for comparing induction algorithms. In: ICML 1998 (1998)

    Google Scholar 

  16. Provost, F.J., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3) (2001)

    Google Scholar 

  17. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  18. Swets, J.: Measuring the accuracy of diagnostic systems. Science 240(4857) (1988)

    Google Scholar 

  19. Ting, K.: The problem of small disjuncts: its remedy in decision trees. In: Canadian Conference on Artificial Intelligence (1994)

    Google Scholar 

  20. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explorations 6(1) (2004)

    Google Scholar 

  21. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning (2000)

    Google Scholar 

  22. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Zhang, X. (2011). Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20847-8_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20846-1

  • Online ISBN: 978-3-642-20847-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics