Skip to main content

Distance Metric Learning with Prototype Selection for Imbalanced Classification

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2021)

Abstract

Distance metric learning is a discipline that has recently become popular, due to its ability to significantly improve similarity-based learning methods, such as the nearest neighbors classifier. Most proposals related to this topic focus on standard supervised learning and weak-supervised learning problems. In this paper, we propose a distance metric learning method to handle imbalanced classification via prototype selection. Our method, which we have called condensed neighborhood components analysis (CNCA), is an improvement of the classic neighborhood components analysis, to which foundations of the condensed nearest neighbors undersampling method are added. We show how to implement this algorithm, and provide a Python implementation. We have also evaluated its performance over imbalanced classification problems, resulting in very good performance using several imbalanced score metrics.

Our work has been supported by the research project PID2020-119478GB-I00 and by a research scholarship (FPU18/05989), given to the author Juan Luis Suárez by the Spanish Ministry of Science, Innovation and Universities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    KEEL, knowledge extraction based on evolutionary learning [22]: http://www.keel.es/.

References

  1. Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(1), 2653–2688 (2017)

    MathSciNet  MATH  Google Scholar 

  2. Benavoli, A., Corani, G., Mangili, F., Zaffalon, M., Ruggeri, F.: A bayesian wilcoxon signed-rank test based on the dirichlet process. In: International Conference on Machine Learning, pp. 1026–1034 (2014)

    Google Scholar 

  3. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 49(2), 1–50 (2016)

    Article  Google Scholar 

  4. Carrasco, J., García, S., del Mar Rueda, M., Herrera, F.: rNPBST: an r package covering non-parametric and Bayesian statistical tests. In: Martínez de Pisón, F.J., Urraca, R., Quintián, H., Corchado, E. (eds.) HAIS 2017. LNCS (LNAI), vol. 10334, pp. 281–292. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59650-1_24

    Chapter  Google Scholar 

  5. Chang, F., Lin, C.C., Lu, C.J.: Adaptive prototype learning algorithms: theoretical and experimental studies. J. Mach. Learn. Res. 7(10), 2125–2148 (2006)

    Google Scholar 

  6. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  7. Cover, T.M., Hart, P.E., et al.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  8. Cunningham, J.P., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. J. Mach. Learn. Res. 16(1), 2859–2900 (2015)

    MathSciNet  MATH  Google Scholar 

  9. Devi, V.S., Murty, M.N.: An incremental prototype set building technique. Pattern Recognit. 35(2), 505–513 (2002)

    Article  Google Scholar 

  10. Feng, L., Wang, H., Jin, B., Li, H., Xue, M., Wang, L.: Learning a distance metric by balancing kl-divergence for imbalanced datasets. IEEE Trans. Syst. Man Cybern. Syst. 99, 1–12 (2018)

    Google Scholar 

  11. Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Heidelberg (2018)

    Google Scholar 

  12. Fernández, A., Garcia, S., Herrera, F., Chawla, N.V.: Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)

    Article  MathSciNet  Google Scholar 

  13. Gates, G.: The reduced nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory 18(3), 431–433 (1972)

    Google Scholar 

  14. Gautheron, L., Habrard, A., Morvant, E., Sebban, M.: Metric learning from imbalanced data with generalization guarantees. Pattern Recognit. Lett. 133, 298–304 (2020)

    Article  Google Scholar 

  15. Goldberger, J., Hinton, G.E., Roweis, S., Salakhutdinov, R.R.: Neighbourhood components analysis. Adv. Neural Inf. Process. Syst. 17, 513–520 (2004)

    Google Scholar 

  16. Hart, P.: The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory 14(3), 515–516 (1968)

    Google Scholar 

  17. Li, Z., Zhang, J., Yao, X., Kou, G.: How to identify early defaults in online lending: a cost-sensitive multi-layer learning framework. Knowl.-Based Syst. 221, 106963 (2021)

    Google Scholar 

  18. Lin, Y., Lee, Y., Wahba, G.: Support vector machines for classification in nonstandard situations. Mach. Learn. 46(1–3), 191–202 (2002)

    Article  Google Scholar 

  19. Suárez, J.L., García, S., Herrera, F.: pyDML: a python library for distance metric learning. J. Mach. Learn. Res. 21(96), 1–7 (2020)

    MATH  Google Scholar 

  20. Suárez, J.L., García, S., Herrera, F.: A tutorial on distance metric learning: mathematical foundations, algorithms, experimental analysis, prospects and challenges. Neurocomputing 425, 300–322 (2021)

    Article  Google Scholar 

  21. Tomek, I.: Two modifications of cnn. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)

    Google Scholar 

  22. Triguero, I., et al.: Keel 3.0: an open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 10, 1238–1249 (2017)

    Google Scholar 

  23. Wang, H., Xu, Y., Chen, Q., Wang, X.: Diagnosis of complications of type 2 diabetes based on weighted multi-label small sphere and large margin machine. Appl. Intell. 51(1), 223–236 (2020). https://doi.org/10.1007/s10489-020-01824-y

    Article  Google Scholar 

  24. Wang, N., Zhao, X., Jiang, Y., Gao, Y.: Iterative metric learning for imbalance data classification. In: IJCAI, pp. 2805–2811 (2018)

    Google Scholar 

  25. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10(2), 207–244 (2009)

    Google Scholar 

  26. Xing, E., Jordan, M., Russell, S.J., Ng, A.: Distance metric learning with application to clustering with side-information. Adv. Neural Inf. Process. Syst. 15, 521–528 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Luis Suárez .

Editor information

Editors and Affiliations

Ethics declarations

Declaration of competing interest

The authors declare that there is no conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Suárez, J.L., García, S., Herrera, F. (2021). Distance Metric Learning with Prototype Selection for Imbalanced Classification. In: Sanjurjo González, H., Pastor López, I., García Bringas, P., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science(), vol 12886. Springer, Cham. https://doi.org/10.1007/978-3-030-86271-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86271-8_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86270-1

  • Online ISBN: 978-3-030-86271-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics