Imbalanced multi-label data classification as a bi-level optimization problem: application to miRNA-related diseases diagnosis

Chabbouh, Marwa; Bechikh, Slim; Mezura-Montes, Efrén; Said, Lamjed Ben

doi:10.1007/s00521-023-08458-4

Imbalanced multi-label data classification as a bi-level optimization problem: application to miRNA-related diseases diagnosis

Original Article
Published: 22 April 2023

Volume 35, pages 16285–16303, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Marwa Chabbouh ORCID: orcid.org/0000-0002-8534-6312¹,
Slim Bechikh²,
Efrén Mezura-Montes³ &
…
Lamjed Ben Said¹

290 Accesses
Explore all metrics

Abstract

In multi-label classification, each instance could be assigned multiple labels at the same time. In such a situation, the relationships between labels and the class imbalance are two serious issues that should be addressed. Despite the important number of existing multi-label classification methods, the widespread class imbalance among labels has not been adequately addressed. Two main issues should be solved to come up with an effective classifier for imbalanced multi-label data. On the one hand, the imbalance could occur between labels and/or within a label. The “Between-labels imbalance” occurs where the imbalance is between labels however the “Within-label imbalance” occurs where the imbalance is in the label itself and it could occur across multiple labels. On the other hand, the labels’ processing order heavily influences the quality of a multi-label classifier. To deal with these challenges, we propose in this paper a bi-level evolutionary approach for the optimized induction of multivariate decision trees, where the upper-level role is to design the classifiers while the lower-level approximates the optimal labels’ ordering for each classifier. Our proposed method, named BIMLC-GA (Bi-level Imbalanced Multi-Label Classification Genetic Algorithm), is compared to several state-of-the-art methods across a variety of imbalanced multi-label data sets from several application fields and then applied on the miRNA-related diseases case study. The statistical analysis of the obtained results shows the merits of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 11

Fig. 12

Feature Selection Using Hybrid Black Hole Genetic Algorithm in Multi-label Datasets

A novel multi-objective genetic algorithm approach to address class imbalance for disease diagnosis

Article 20 May 2020

Multi-objective Evolutionary Instance Selection for Multi-label Classification

Data Availability

The data sets analysed during the current study are available in http://www.uco.es/kdis/mllresources/. The real human miRNA-disease associations were retrieved from HMDD v3.0 database [52].

Notes

References

Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with dte-sbd: decision tree ensemble based on smote and bagging with differentiated sampling rates. Inf Sci 425:76–91
Article MathSciNet Google Scholar
Bi J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowledge-Based Syst 158:81–93
Article Google Scholar
Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: an open-source software for multi-class imbalance learning. Knowledge-Based Syst 174:137–143
Article Google Scholar
Zhang M-L, Zhou Z-H (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Article MATH Google Scholar
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333
Article MathSciNet Google Scholar
Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. In: ICML, pp. 279–286
Read J, Martino L, Luengo D (2013) Efficient monte carlo optimization for multi-label classifier chains. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3457–3461. IEEE
Hernandez-Leal P, Orihuela-Espina F, Sucar E, Morales EF (2012) Hybrid binary-chain multi-label classifiers. In: Procceeding 6th European Workshop Probabilistic Graphical Models, pp. 139–146. Citeseer
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern Recognit 45(9):3084–3104
Article Google Scholar
Tsoumakas G, Partalas I, Vlahavas I (2008) A taxonomy and short review of ensemble selection. In: Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications, pp. 1–6
Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv (CSUR) 47(3):1–38
Article Google Scholar
Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Annal Op Res 153(1):235–256
Article MathSciNet MATH Google Scholar
Cerrada M, Sánchez R-V, Pacheco F, Cabrera D, Zurita G, Li C (2016) Hierarchical feature selection based on relative dependency for gear fault diagnosis. Appl Intell 44(3):687–703
Article Google Scholar
Bennett KP, Kunapuli G, Hu J, Pang J-S (2008) Bilevel optimization and machine learning. In: IEEE World Congress on Computational Intelligence, pp. 25–47. Springer
Weng W, Li Y-W, Liu J-H, Wu S-X, Chen C-L (2021) Multi-label classification review and opportunities. J Netw Intell 6(2):255–275
Google Scholar
Tahir MA, Kittler J, Yan F (2012) Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognit 45(10):3738–3750
Article Google Scholar
Charte F, Rivera AJ, del Jesus MJ, Herrera F (2015) Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163:3–16
Article Google Scholar
Li, L., Wang, H.: Towards label imbalance in multi-label classification with many labels. http://arxiv.org/abs/1604.01304 (2016)
Moyano JM, Gibaja EL, Cios KJ, Ventura S (2020) Combining multi-label classifiers based on projections of the output space using evolutionary algorithms. Knowledge-Based Syst 196:105770
Article Google Scholar
Rastin N, Jahromi MZ, Taheri M (2020) A generalized weighted distance k-nearest neighbor for multi-label problems. Pattern Recognit 45:107526
Google Scholar
Cheng K, Gao S, Dong W, Yang X, Wang Q, Yu H (2020) Boosting label weighted extreme learning machine for classifying multi-label imbalanced data. Neurocomputing 403:360–370
Article Google Scholar
Zhang M-L, Li Y-K, Yang H, Liu X-Y (2020) Towards class-imbalance aware multi-label learning. IEEE Trans Cybernet 52:4459
Article Google Scholar
Charte F, Rivera AJ, del Jesus MJ, Herrera F (2019) Remedial-hwr: Tackling multilabel imbalance through label decoupling and data resampling hybridization. Neurocomputing 326:110–122
Article Google Scholar
Ding M, Yang Y, Lan Z (2018) Multi-label imbalanced classification based on assessments of cost and value. Appl Intell 48(10):3577–3590
Article Google Scholar
Tao Y, Jiang B, Xue L, Xie C, Zhang Y (2021) Evolutionary synthetic oversampling technique and cocktail ensemble model for warfarin dose prediction with imbalanced data. Neural Computing and Applications 33(17):11203–11221
Article Google Scholar
Slowik A, Kwasnicka H (2020) Evolutionary algorithms and their applications to engineering problems. Neural Comput Appl 32(16):12363–12379
Article Google Scholar
Moyano JM, Gibaja EL, Cios KJ, Ventura S (2019) An evolutionary approach to build ensembles of multi-label classifiers. Inf Fusion 50:168–180
Article Google Scholar
Moyano JM, Gibaja EL, Cios KJ, Ventura S (2020) Generating ensembles of multi-label classifiers using cooperative coevolutionary algorithms. In: ECAI 2020, pp. 1379–1386. IOS Press,
Cerri R, Basgalupp MP, Barros RC, de Carvalho AC (2019) Inducing hierarchical multi-label classification rules with genetic algorithms. Appl Soft Comput 77:584–604
Article Google Scholar
Omozaki, Y., Masuyama, N., Nojima, Y., Ishibuchi, H.: Multiobjective fuzzy genetics-based machine learning for multi-label classification. In: 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–8 (2020). IEEE
Zitzler E, Künzli S (2004) Indicator-based selection in multiobjective search. In: International Conference on Parallel Problem Solving from Nature, pp. 832–842. Springer
Basseur M, Burke EK (2007) Indicator-based multi-objective local search. In: 2007 IEEE Congress on Evolutionary Computation, pp. 3100–3107. IEEE
Chawla NV (2009) Data mining for imbalanced datasets: an overview. Data mining and knowledge discovery handbook, 875–886
Said R, Bechikh S, Louati A, Aldaej A, Said LB (2020) Solving combinatorial multi-objective bi-level optimization problems using multiple populations and migration schemes. IEEE Access 8:141674–141695
Article Google Scholar
Chaabani A, Bechikh S, Said LB (2018) A new co-evolutionary decomposition-based algorithm for bi-level combinatorial optimization. Appl Intell 48(9):2847–2872
Article Google Scholar
Gad AF (2021) Pygad: an intuitive genetic algorithm python library. http://arxiv.org/abs/2106.06158
Olson RS, Moore JH (2016) Tpot: a tree-based pipeline optimization tool for automating machine learning. In: Workshop on Automatic Machine Learning, pp. 66–74. PMLR
Read J (2010) Scalable multi-label classification. PhD thesis, University of Waikato
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Garcia S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets’’ for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959
Article Google Scholar
Sheskin DJ (2003) Handbook of parametric and nonparametric statistical procedures. Chapman and Hall/CRC, UK
Book MATH Google Scholar
Triguero I, González S, Moyano JM, García S, Alcalá-Fdez J, Luengo J, Fernández A, del Jesús MJ, Sánchez L, Herrera F (2017) Keel 3.0: an open source software for multi-stage analysis in data mining. Int J Comput Intell Syst 10(1):1238–1249
Article Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scandinavian J Stat 45:65–70
MathSciNet MATH Google Scholar
Shaffer JP (1986) Modified sequentially rejective multiple test procedures. J Am Stat Assoc 81(395):826–831
Article MATH Google Scholar
Ambros V (2004) The functions of animal micrornas. Nature 431(7006):350–355
Article Google Scholar
Bartel DP (2004) Micrornas: genomics, biogenesis, mechanism, and function. Cell 116(2):281–297
Article Google Scholar
Kozomara A, Griffiths-Jones S (2014) mirbase: annotating high confidence micrornas using deep sequencing data. Nucl Acids Res 42(D1):68–73
Article Google Scholar
Friedman RC, Farh KK-H, Burge CB, Bartel DP (2009) Most mammalian mrnas are conserved targets of micrornas. Genome Res 19(1):92–105
Article Google Scholar
Esteller M (2011) Non-coding rnas in human disease. Nat Rev Genetics 12(12):861–874
Article Google Scholar
Stricker M, Asim MN, Dengel A, Ahmed S (2021) Circnet: an encoder-decoder-based convolution neural network (cnn) for circular rna identification. Neural Comput Appl 10:1–12
Google Scholar
Huang Z, Shi J, Gao Y, Cui C, Zhang S, Li J, Zhou Y, Cui Q (2019) Hmdd v3. 0: a database for experimentally supported human microrna-disease associations. Nucl Acids Res 47(D1):1013–1017
Article Google Scholar
Kibbe WA, Arze C, Felix V, Mitraka E, Bolton E, Fu G, Mungall CJ, Binder JX, Malone J, Vasant D et al (2015) Disease ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucl Acids Res 43(D1):1071–1078
Article Google Scholar

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

SMART Lab, ISG-Campus, University of Tunis, Liberty Street, 2000, Bardo, Tunis, Tunisia
Marwa Chabbouh & Lamjed Ben Said
IEEE SM, SMART Lab, ISG-Campus, University of Tunis, Liberty Street, 2000, Bardo, Tunis, Tunisia
Slim Bechikh
IEEE SM, Artificial Intelligence Research Institute, University of Veracruz, Calle Paseo 112, Col. Nueva Xalapa, 91097, Xalapa, Veracruz, México
Efrén Mezura-Montes

Authors

Marwa Chabbouh
View author publications
You can also search for this author in PubMed Google Scholar
Slim Bechikh
View author publications
You can also search for this author in PubMed Google Scholar
Efrén Mezura-Montes
View author publications
You can also search for this author in PubMed Google Scholar
Lamjed Ben Said
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marwa Chabbouh.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chabbouh, M., Bechikh, S., Mezura-Montes, E. et al. Imbalanced multi-label data classification as a bi-level optimization problem: application to miRNA-related diseases diagnosis. Neural Comput & Applic 35, 16285–16303 (2023). https://doi.org/10.1007/s00521-023-08458-4

Download citation

Received: 04 April 2022
Accepted: 03 March 2023
Published: 22 April 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00521-023-08458-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced multi-label data classification as a bi-level optimization problem: application to miRNA-related diseases diagnosis

Abstract

Access this article

Similar content being viewed by others

Feature Selection Using Hybrid Black Hole Genetic Algorithm in Multi-label Datasets

A novel multi-objective genetic algorithm approach to address class imbalance for disease diagnosis

Multi-objective Evolutionary Instance Selection for Multi-label Classification

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Imbalanced multi-label data classification as a bi-level optimization problem: application to miRNA-related diseases diagnosis

Abstract

Access this article

Similar content being viewed by others

Feature Selection Using Hybrid Black Hole Genetic Algorithm in Multi-label Datasets

A novel multi-objective genetic algorithm approach to address class imbalance for disease diagnosis

Multi-objective Evolutionary Instance Selection for Multi-label Classification

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation