Oversampling Method Based Covariance Matrix Estimation in High-Dimensional Imbalanced Classification

Leguen-de-Varona, Ireimis; Madera, Julio; Gonzalez, Hector; Tubex, Lise; Verdonck, Tim

doi:10.1007/978-3-031-49552-6_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14335))

Included in the following conference series:

International Workshop on Artificial Intelligence and Pattern Recognition

317 Accesses

Abstract

Class imbalance is a common problem in (binary) classification problems. It appears in many application domains, such as text classification, fraud detection, churn prediction and medical diagnosis. A widely used approach to cope with this problem at the data level is the Synthetic Minority Oversampling Technique (SMOTE) which uses the K-Nearest Neighbors (KNN) algorithm to generate new, artificial instances in the minority class. It is however known that SMOTE is not ideal for high-dimensional data. Therefore, we propose an alternative oversampling strategy for imbalanced classification problems in high dimensions. Our approach is based on the sparse inverse covariance matrix estimated trough the Ledoit-Wolf method for high-dimensional data. The results show that our proposal has a competitive performance with respect to popular competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Synthetic Minority Oversampling Method Based on Local Densities in Low-Dimensional Space for Imbalanced Learning

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

An oversampling algorithm for high-dimensional imbalanced learning with class overlapping

Article 11 November 2024

References

Blagus, R., Lusa, L.: SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 14, 106 (2013). https://doi.org/10.1186/1471-2105-14-106
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chen, Y., Wiesel, A., Hero, A.O.: Shrinkage estimation of high dimensional covariance matrices. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2937–2940. IEEE (2009)
Google Scholar
Clemmensen, L., Hastie, T., Witten, D., Ersbøll, B.: Sparse discriminant analysis. Technometrics 53(4), 406–413 (2011)
Article MathSciNet Google Scholar
Fernández, A., Garcia, S., Herrera, F., Chawla, N.V.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)
Article MathSciNet Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Hsieh, C.-J., Sustik, M.A., Dhillon, I.S., Ravikumar, P.K., Poldrack, R.: BIG & QUIC: sparse inverse covariance estimation for a million variables. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Google Scholar
Ledoit, O., Wolf, M.: Honey, i shrunk the sample covariance matrix. UPF Economics and Business Working Paper (691) (2003)
Google Scholar
Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88(2), 365–411 (2004)
Article MathSciNet Google Scholar
Ledoit, O., Wolf, M.: The power of (non-) linear shrinking: a review and guide to covariance matrix estimation. J. Financ. Economet. 20(1), 187–218 (2022)
Article Google Scholar
Leguen-deVarona, I., Madera, J., Martínez-López, Y., Hernández-Nieto, J.C.: SMOTE-Cov: a new oversampling method based on the covariance matrix. In: Vasant, P., Litvinchev, I., Marmolejo-Saucedo, J.A., Rodriguez-Aguilar, R., Martinez-Rios, F. (eds.) Data Analysis and Optimization for Engineering and Computing Problems. EICC, pp. 207–215. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48149-0_15
Chapter Google Scholar
Lotfi, R., Shahsavani, D., Arashi, M.: Classification in high dimension using the Ledoit-Wolf shrinkage method. Mathematics 10(21), 4069 (2022)
Article Google Scholar
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)
Article Google Scholar
Nekooeimehr, I., Lai-Yuen, S.K.: Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Syst. Appl. 46, 405–416 (2016)
Article Google Scholar
Li, M., Wan, Q., Deng, X., Yang, H.: Synthetic minority oversampling technique based on sample density distribution for enhanced classification on imbalanced microarray data. In: ICCDA (2022)
Google Scholar
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RS$B$*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33, 245–265 (2012). https://doi.org/10.1007/s10115-011-0465-6
Article Google Scholar
Fernandez, A., Maldonado, S., Vairetti, C., Herrera, F.: FW-SMOTE: a feature-weighted oversampling approach for imbalanced classification. Pattern Recogn. 124, 108511 (2022)
Article Google Scholar
López, J., Maldonado, S., Vairetti, C.: An alternative SMOTE oversampling strategy for high-dimensional datasets. Appl. Soft Comput. J. 76, 380–389 (2019)
Article Google Scholar
Sharma, S., Gosain, A., Jain, S.: A review of the oversampling techniques in class imbalance problem. In: Khanna, A., Gupta, D., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds.) International Conference on Innovative Computing and Communications. AISC, vol. 1387, pp. 459–472. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2594-7_38
Chapter Google Scholar
Saadatfar, H., Mayabadi, S.: Two density-based sampling approaches for imbalanced and overlapping data. Knowl.-Based Syst. 241, 108217 (2022)
Article Google Scholar
Wei, G., Weimeng, M., Song, Y., Dou, J.: An improved and random synthetic minority oversampling technique for imbalanced data. Knowl.-Based Syst. 248, 108839 (2022)
Article Google Scholar

Download references

Acknowledgments

We would like to thanks VLIR (Vlaamse Inter Universitaire Raad, Flemish Interuniversity Council, Belgium) for supporting this work under the project Cuban ICT NETWORK programe: “Strengthening the ICT role in Cuban Universities for the development of the society”; specifically to Project 1: “Strengthening the research on ICT and its knowledge transference to the Cuban society (RESICT)” and also to the Cuban national project “Plataforma para el análisis de grandes volúmenes de datos y su aplicación a sectores estratégicos”.

Author information

Authors and Affiliations

Universidad de Camagüey “Ignacio Agramonte Loynaz”, Camaguey, Cuba
Ireimis Leguen-de-Varona & Julio Madera
Universidad de las Ciencias Informaticas (UCI), Havana, La Habana, Cuba
Hector Gonzalez
University of Antwerp, Antwerp, Belgium
Lise Tubex & Tim Verdonck

Authors

Ireimis Leguen-de-Varona
View author publications
You can also search for this author in PubMed Google Scholar
Julio Madera
View author publications
You can also search for this author in PubMed Google Scholar
Hector Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Lise Tubex
View author publications
You can also search for this author in PubMed Google Scholar
Tim Verdonck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ireimis Leguen-de-Varona .

Editor information

Editors and Affiliations

Universidad de las Ciencias Informáticas, Havana, Cuba
Yanio Hernández Heredia
Universidad de las Ciencias Informáticas, Havana, Cuba
Vladimir Milián Núñez
Universidad de las Ciencias Informáticas, Havana, Cuba
José Ruiz Shulcloper

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leguen-de-Varona, I., Madera, J., Gonzalez, H., Tubex, L., Verdonck, T. (2024). Oversampling Method Based Covariance Matrix Estimation in High-Dimensional Imbalanced Classification. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2023. Lecture Notes in Computer Science, vol 14335. Springer, Cham. https://doi.org/10.1007/978-3-031-49552-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-49552-6_2
Published: 20 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49551-9
Online ISBN: 978-3-031-49552-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Oversampling Method Based Covariance Matrix Estimation in High-Dimensional Imbalanced Classification