Abstract
Distance metric learning is a discipline that has recently become popular, due to its ability to significantly improve similarity-based learning methods, such as the nearest neighbors classifier. Most proposals related to this topic focus on standard supervised learning and weak-supervised learning problems. In this paper, we propose a distance metric learning method to handle imbalanced classification via prototype selection. Our method, which we have called condensed neighborhood components analysis (CNCA), is an improvement of the classic neighborhood components analysis, to which foundations of the condensed nearest neighbors undersampling method are added. We show how to implement this algorithm, and provide a Python implementation. We have also evaluated its performance over imbalanced classification problems, resulting in very good performance using several imbalanced score metrics.
Our work has been supported by the research project PID2020-119478GB-I00 and by a research scholarship (FPU18/05989), given to the author Juan Luis Suárez by the Spanish Ministry of Science, Innovation and Universities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
KEEL, knowledge extraction based on evolutionary learning [22]: http://www.keel.es/.
References
Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(1), 2653–2688 (2017)
Benavoli, A., Corani, G., Mangili, F., Zaffalon, M., Ruggeri, F.: A bayesian wilcoxon signed-rank test based on the dirichlet process. In: International Conference on Machine Learning, pp. 1026–1034 (2014)
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 49(2), 1–50 (2016)
Carrasco, J., García, S., del Mar Rueda, M., Herrera, F.: rNPBST: an r package covering non-parametric and Bayesian statistical tests. In: Martínez de Pisón, F.J., Urraca, R., Quintián, H., Corchado, E. (eds.) HAIS 2017. LNCS (LNAI), vol. 10334, pp. 281–292. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59650-1_24
Chang, F., Lin, C.C., Lu, C.J.: Adaptive prototype learning algorithms: theoretical and experimental studies. J. Mach. Learn. Res. 7(10), 2125–2148 (2006)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cover, T.M., Hart, P.E., et al.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Cunningham, J.P., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. J. Mach. Learn. Res. 16(1), 2859–2900 (2015)
Devi, V.S., Murty, M.N.: An incremental prototype set building technique. Pattern Recognit. 35(2), 505–513 (2002)
Feng, L., Wang, H., Jin, B., Li, H., Xue, M., Wang, L.: Learning a distance metric by balancing kl-divergence for imbalanced datasets. IEEE Trans. Syst. Man Cybern. Syst. 99, 1–12 (2018)
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Heidelberg (2018)
Fernández, A., Garcia, S., Herrera, F., Chawla, N.V.: Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)
Gates, G.: The reduced nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory 18(3), 431–433 (1972)
Gautheron, L., Habrard, A., Morvant, E., Sebban, M.: Metric learning from imbalanced data with generalization guarantees. Pattern Recognit. Lett. 133, 298–304 (2020)
Goldberger, J., Hinton, G.E., Roweis, S., Salakhutdinov, R.R.: Neighbourhood components analysis. Adv. Neural Inf. Process. Syst. 17, 513–520 (2004)
Hart, P.: The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory 14(3), 515–516 (1968)
Li, Z., Zhang, J., Yao, X., Kou, G.: How to identify early defaults in online lending: a cost-sensitive multi-layer learning framework. Knowl.-Based Syst. 221, 106963 (2021)
Lin, Y., Lee, Y., Wahba, G.: Support vector machines for classification in nonstandard situations. Mach. Learn. 46(1–3), 191–202 (2002)
Suárez, J.L., García, S., Herrera, F.: pyDML: a python library for distance metric learning. J. Mach. Learn. Res. 21(96), 1–7 (2020)
Suárez, J.L., García, S., Herrera, F.: A tutorial on distance metric learning: mathematical foundations, algorithms, experimental analysis, prospects and challenges. Neurocomputing 425, 300–322 (2021)
Tomek, I.: Two modifications of cnn. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)
Triguero, I., et al.: Keel 3.0: an open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 10, 1238–1249 (2017)
Wang, H., Xu, Y., Chen, Q., Wang, X.: Diagnosis of complications of type 2 diabetes based on weighted multi-label small sphere and large margin machine. Appl. Intell. 51(1), 223–236 (2020). https://doi.org/10.1007/s10489-020-01824-y
Wang, N., Zhao, X., Jiang, Y., Gao, Y.: Iterative metric learning for imbalance data classification. In: IJCAI, pp. 2805–2811 (2018)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10(2), 207–244 (2009)
Xing, E., Jordan, M., Russell, S.J., Ng, A.: Distance metric learning with application to clustering with side-information. Adv. Neural Inf. Process. Syst. 15, 521–528 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Declaration of competing interest
The authors declare that there is no conflict of interest.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Suárez, J.L., García, S., Herrera, F. (2021). Distance Metric Learning with Prototype Selection for Imbalanced Classification. In: Sanjurjo González, H., Pastor López, I., García Bringas, P., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science(), vol 12886. Springer, Cham. https://doi.org/10.1007/978-3-030-86271-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-86271-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86270-1
Online ISBN: 978-3-030-86271-8
eBook Packages: Computer ScienceComputer Science (R0)