New Ordering-Based Pruning Metrics for Ensembles of Classifiers in Imbalanced Datasets

Galar, Mikel; Fernández, Alberto; Barrenechea, Edurne; Bustince, Humberto; Herrera, Francisco

doi:10.1007/978-3-319-26227-7_1

Mikel Galar⁷,
Alberto Fernández⁸,
Edurne Barrenechea⁷,
Humberto Bustince⁷ &
…
Francisco Herrera⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 403))

1014 Accesses
2 Citations

Abstract

The task of classification with imbalanced datasets have attracted quite interest from researchers in the last years. The reason behind this fact is that many applications and real problems present this feature, causing standard learning algorithms not reaching the expected performance. Accordingly, many approaches have been designed to address this problem from different perspectives, i.e., data preprocessing, algorithmic modification, and cost-sensitive learning. The extension of the former techniques to ensembles of classifiers has shown to be very effective in terms of quality of the output models. However, the optimal value for the number of classifiers in the pool cannot be known a priori, which can alter the behaviour of the system. For this reason, ordering-based pruning techniques have been proposed to address this issue in standard classifier learning problems. The hitch is that those metrics are not designed specifically for imbalanced classification, thus hindering the performance in this context. In this work, we propose two novel adaptations for ordering-based pruning metrics in imbalanced classification, specifically the margin distance minimization and the boosting-based approach. Throughout a complete experimental study, our analysis shows the goodness of both schemes in contrast with the unpruned ensembles and the standard pruning metrics in Bagging-based ensembles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barandela, R., Valdovinos, R., Sánchez, J.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)
Article MathSciNet Google Scholar
Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognit. 36(3), 849–851 (2003)
Article Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)
Article Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for class imbalance problem: bagging, boosting and hybrid based approaches. IEEE Trans. Syst., Man Cybern. Part C: Appl. Rev. 42(4), 463–484 (2012)
Article Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit. 46(12), 3460–3471 (2013)
Article Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Hernández-Lobato, D., Martínez-Muñoz, G., Suárez, A.: Statistical instance-based pruning in ensembles of independent classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 364–369 (2009)
Article Google Scholar
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)
Article Google Scholar
Kuncheva, L.I.: Diversity in multiple classifier systems. Inf. Fusion 6(1), 3–4 (2005)
Article MathSciNet Google Scholar
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250(20), 113–141 (2013)
Article Google Scholar
López, V., Fernández, A., Herrera, F.: On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed. Inf. Sci. 257, 1–13 (2014)
Article Google Scholar
Martínez-Muñoz, G., Suárez, A.: Using boosting to prune bagging ensembles. Pattern Recognit. Lett. 28(1), 156–165 (2007)
Article Google Scholar
Martínez-Muñoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 245–259 (2009)
Article Google Scholar
Moreno-Torres, J.G., Sáez, J.A., Herrera, F.: Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1304–1313 (2012)
Article Google Scholar
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–45 (2006)
Article Google Scholar
Prati, R.C., Batista, G.E.A.P.A., Silva, D.F.: Class imbalance revisited: a new experimental setup to assess the performance of treatment methods. Knowledge and Information Systems, 1–25 (2014, in press)
Google Scholar
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kauffman, San Mateo (1993)
Google Scholar
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1), 1–39 (2010)
Article MathSciNet Google Scholar
Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)
Article Google Scholar
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining (CIDM’09), 324–331 (2009)
Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)
Article Google Scholar
Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03), 435–442 (2003)
Google Scholar
Zhang, Y., Burer, S., Street, W.N.: Ensemble pruning via semi-definite programming. J. Mach. Learn. Res. 7, 1315–1338 (2006)
MathSciNet MATH Google Scholar
Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137(1–2), 239–263 (2002)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was supported by the Spanish Ministry of Science and Technology under projects TIN-2011-28488, TIN-2012-33856, TIN2013-40765-P; the Andalusian Research Plans P11-TIC-7765 and P10-TIC-6858; and both the University of Jaén and Caja Rural Provincial de Jaén under project UJA2014/06/15.

Author information

Authors and Affiliations

Departamento de Automática y Computación, ISC (Institute of Smart Cities), Universidad Pública de Navarra, Pamplona, Spain
Mikel Galar, Edurne Barrenechea & Humberto Bustince
Department of Computer Science, University of Jaén, Jaén, Spain
Alberto Fernández
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Francisco Herrera

Authors

Mikel Galar
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Edurne Barrenechea
View author publications
You can also search for this author in PubMed Google Scholar
Humberto Bustince
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mikel Galar .

Editor information

Editors and Affiliations

Department of Systems, Wrocław University of Technology, Wroclaw, Poland
Robert Burduk
Department of Systems and Computer, Wrocław University of Technology, Wroclaw, Poland
Konrad Jackowski
Department of Systems and Computer, Wrocław University of Technology, Wroclaw, Poland
Marek Kurzyński
Dept. of Systems and Computer Networks, Wrocław University of Technology, Wroclaw, Poland
Michał Woźniak
Department of Systems, Wrocław University of Technology, Wroclaw, Poland
Andrzej Żołnierek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F. (2016). New Ordering-Based Pruning Metrics for Ensembles of Classifiers in Imbalanced Datasets. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-26227-7_1
Published: 05 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26225-3
Online ISBN: 978-3-319-26227-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics