Multi-objective evolutionary optimization using the relationship between F1 and accuracy metrics in classification tasks

Fernández, Juan Carlos; Carbonero, Mariano; Gutiérrez, Pedro Antonio; Hervás-Martínez, César

doi:10.1007/s10489-019-01447-y

Multi-objective evolutionary optimization using the relationship between F₁ and accuracy metrics in classification tasks

Published: 13 April 2019

Volume 49, pages 3447–3463, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Juan Carlos Fernández ORCID: orcid.org/0000-0001-8849-6036¹,
Mariano Carbonero²,
Pedro Antonio Gutiérrez¹ &
…
César Hervás-Martínez¹

394 Accesses
12 Citations
Explore all metrics

Abstract

This work analyses the complementarity and contrast between two metrics commonly used for evaluating the quality of a binary classifier: the correct classification rate or accuracy, C, and the F₁ metric, which is very popular when dealing with imbalanced datasets. Based on this analysis, a set of constraints relating C and F₁ are defined as a function of the ratio of positive patterns in the dataset. We evaluate the possibility of using a multi-objective evolutionary algorithm guided by this pair of metrics to optimise binary classification models. To check the validity of the constraints, we perform an empirical analysis considering 26 benchmark datasets obtained from the UCI repository and an interesting liver transplant dataset. The results show that the relation is fulfilled and that the use of the algorithm for simultaneously optimising the pair (C,F₁) leads to a generally balanced accuracy for both classes. The experiments also reveal that, in some cases, better results are obtained by using the majority class as the positive class instead of using the minority one, which is the most common approach with imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

Selection Operators Based on Maximin Fitness Function for Multi-Objective Evolutionary Algorithms

Multi-objective Feature Selection in Classification: A Differential Evolution Approach

References

Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning (ECML2004), pp 39–50
Almogahed B, Kakadiaris I (2015) NEATER: filtering of over-sampled data using non-cooperative game theory. Soft Comput 19(11):3301–3322
Article Google Scholar
Asuncion A, Newman D (2007) UCI maching learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html
Batuwita R, Palade V (2012) Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning. J Bioinform Comput Biol 10(4):1250,003
Article Google Scholar
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Article Google Scholar
Briceño J, Cruz-Ramírez M, Prieto M, Navasa M, Ortiz de Urbina J, Orti R, Gómez-Bravo M, Otero A, Varo E, Tomé S, Clemente G, Bañares R, Bárcena R, Cuervas-Mons V, Solórzano G, Vinaixa C, Rubín A, Colmenero J, Valdivieso A, Ciria R, Hervás-Martínez C, de la Mata M (2014) Use of artificial intelligence as an innovative donor-recipient matching model for liver transplantation: results from a multicenter Spanish study. J Hepatol 61(5):1020–1028
Article Google Scholar
Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Transactions on Neural Networks and Learning Systems 24(6):888–899
Article Google Scholar
Chawla N, Sylvester J (2007) Exploiting diversity in ensembles: improving the performance on unbalanced datasets. In: Multiple classifier systems, lecture notes in computer science, vol 4472, pp 397–406
Chen S, He H, Garcia EA (2010) RAMOBoost: ranked minority oversampling in boosting. IEEE Trans Neural Netw 21(10):1624–1642
Article Google Scholar
Chinta P, Balamurugan P, Shevade S, Murty M (2013) Optimizing F-measure with non-convex loss and sparse linear classifiers. In: The 2013 international joint conference on neural networks (IJCNN), pp 1–8
Collette Y, Siarry P (2004) Multiobjective optimization - principles and case studies. Decision engineering. Springer, Berlin
Book MATH Google Scholar
Cruz-Ramírez M, Hervás-Martínez C, Fernández JC, Briceño J, de la Mata M (2013) Predicting patient survival after liver transplantaton using evolutionary multi-objective artificial neural networks. Artif Intell Med 58(1):37–49
Article Google Scholar
Deb K, Pratab A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA2. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923
Article Google Scholar
Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
Estabrooks A, Japkowicz N (2001) A mixture-of-experts framework for learning from imbalanced data sets. In: Advances in intelligent data analysis, lecture notes in computer science, vol 2189, pp 34–43
Faris H, Al-Zoubi AM, Heidari AA, Aljarah I, Mafarja M, Hassonah MA, Fujita H (2018) An efficient binary Salp swarm algorithm with crossover scheme for feature selection problems. Knowl-Based Syst 154:43–67
Article Google Scholar
Faris H, Al-Zoubi AM, Heidari AA, Aljarah I, Mafarja M, Hassonah MA, Fujita H (2019) An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Information Fusion 48:67–83
Article Google Scholar
Fernández JC, Martínez FJ, Hervás C, Gutiérrez PA (2010) Sensitivity versus accuracy in multi-class problems using memetic pareto evolutionary neural networks. IEEE Trans Neural Netw 21(5):750–770
Article Google Scholar
Furtuna R, Curteanu S, Leon F (2012) Multi-objective optimization of a stacked neural network using an evolutionary hyper-heuristic. Appl Soft Comput 12(1):133–144
Article Google Scholar
García V, Mollineda R, Sánchez J (2009) Index of balanced accuracy: a performance measure for skewed class distributions. In: Proceedings of the 4th Iberian conference on pattern recognition and image analysis (IbPRIA 2009), lecture notes in computer science, vol 5524, pp 441–448
García V, Sánchez J, Mollineda R (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl-Based Syst 25(1):13–21
Article Google Scholar
Gong M, Liu J, Li H, Cai Q, Su L (2015) A multiobjective sparse feature learning model for deep neural networks. IEEE Transactions on Neural Networks and Learning Systems 26(12):3263–3277
Article MathSciNet Google Scholar
Han X, Dai Q (2018) Batch-normalized Mlpconv-wise supervised pre-training network in network. Appl Intell 48(1):142–155
Article Google Scholar
Hand D (1986) Recent advances in error rate estimation. Pattern Recogn Lett 4(5):335–346
Article Google Scholar
He H, García EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Jansche M (2005) Maximum expected f-measure training of logistic regression models. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, pp 692–699
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelligent Data Analysis 6(5):429–449
Article MATH Google Scholar
Joachims T (2005) A support vector method for multivariate performance measures. In: Proceedings of the 22nd international conference on machine learning, pp 377–384
Joshi M (2002) On evaluating performance of classifiers for rare classes. In: Proceedings 2002 IEEE international conference on data mining, pp 641–644
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the fourteenth international conference on machine learning, pp 179– 186
Liu J, Gong M, Miao Q, Wang X, Li H (2018) Structure learning for deep neural networks based on multiobjective optimization. IEEE Transactions on Neural Networks and Learning Systems 29(6):2450–2463
Article MathSciNet Google Scholar
Liu Z, Tan M, Jiang F (2009) Regularized F-measure maximization for feature selection and classification. J Biomed Biotechnol 2009:617946:8
Google Scholar
Luengo J, Fernández A, García S, Herrera F (2011) Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling. Soft Comput 15(10):1909–1936
Article Google Scholar
Ma X, Qi Y, Li L, Liu F, Jiao L, Wu J (2014) MOEA/D with uniform decomposition measurement for many-objective problems. Soft Comput 18(12):2541–2564
Article Google Scholar
Maratea A, Petrosino A, Manzo M (2014) Adjusted F-measure and kernel scaling for imbalanced data learning. Inf Sci 257:331–341
Article Google Scholar
Marchand M, Taylor JS (2003) The set covering machine. J Mach Learn Res 3:723–746
MathSciNet MATH Google Scholar
Martínez-Estudillo F, Hervás-Martínez C, Gutiérrez P, Martínez-Estudillo A (2008) Evolutionary product-unit neural networks classifiers. Neurocomputing 72(1–3):548–561
Article MATH Google Scholar
Musicant DR, Kumar V, Ozgur A (2003) Optimizing F-measure with support vector machines. In: Proceedings of the international FLAIRS conference, pp 356–360
Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, London
MATH Google Scholar
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437
Article Google Scholar
Song G, Dai Q (2017) A novel double deep ELMs ensemble system for time series forecasting. Knowl-Based Syst 134: 31–49
Article Google Scholar
Tan CJ, Lim CP, Cheah YN (2014) A multi-objective evolutionary algorithm-based ensemble optimizer for feature selection and classification with neural network models. Neurocomputing 125:217–228
Article Google Scholar
Weiss GM (2004) Mining with rarity: a unifying framework. Sigkdd Explorations 6(1):7–19
Article Google Scholar
Weiss GM, Provost F (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19(1):315–354
Article MATH Google Scholar
Wu G, Chang EY (2005) KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795
Article Google Scholar

Download references

Funding

This work has been partially subsidised by the TIN2014-54583-C2-1-R, TIN2017-85887-C2-1-P and TIN2017-90567-REDT projects of the Spanish Ministry of Economy and Competitiveness (MINECO), and FE726 DER funds of the European Union.

Author information

Authors and Affiliations

Department of Computer Science and Numerical Analysis, University of Cordoba, 14071, Córdoba, Spain
Juan Carlos Fernández, Pedro Antonio Gutiérrez & César Hervás-Martínez
Department of Quantitative Methods, Universidad Loyola Andalucía, Córdoba, Spain
Mariano Carbonero

Authors

Juan Carlos Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Mariano Carbonero
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Antonio Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar
César Hervás-Martínez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Carlos Fernández.

Ethics declarations

Conflict of interests

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fernández, J.C., Carbonero, M., Gutiérrez, P.A. et al. Multi-objective evolutionary optimization using the relationship between F₁ and accuracy metrics in classification tasks. Appl Intell 49, 3447–3463 (2019). https://doi.org/10.1007/s10489-019-01447-y

Download citation

Published: 13 April 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s10489-019-01447-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-objective evolutionary optimization using the relationship between F₁ and accuracy metrics in classification tasks

Abstract

Access this article

Similar content being viewed by others

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

Selection Operators Based on Maximin Fitness Function for Multi-Objective Evolutionary Algorithms

Multi-objective Feature Selection in Classification: A Differential Evolution Approach

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-objective evolutionary optimization using the relationship between F1 and accuracy metrics in classification tasks

Abstract

Access this article

Similar content being viewed by others

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

Selection Operators Based on Maximin Fitness Function for Multi-Objective Evolutionary Algorithms

Multi-objective Feature Selection in Classification: A Differential Evolution Approach

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Multi-objective evolutionary optimization using the relationship between F₁ and accuracy metrics in classification tasks