Skip to main content

New Ordering-Based Pruning Metrics for Ensembles of Classifiers in Imbalanced Datasets

  • Conference paper
  • First Online:
Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015

Abstract

The task of classification with imbalanced datasets have attracted quite interest from researchers in the last years. The reason behind this fact is that many applications and real problems present this feature, causing standard learning algorithms not reaching the expected performance. Accordingly, many approaches have been designed to address this problem from different perspectives, i.e., data preprocessing, algorithmic modification, and cost-sensitive learning. The extension of the former techniques to ensembles of classifiers has shown to be very effective in terms of quality of the output models. However, the optimal value for the number of classifiers in the pool cannot be known a priori, which can alter the behaviour of the system. For this reason, ordering-based pruning techniques have been proposed to address this issue in standard classifier learning problems. The hitch is that those metrics are not designed specifically for imbalanced classification, thus hindering the performance in this context. In this work, we propose two novel adaptations for ordering-based pruning metrics in imbalanced classification, specifically the margin distance minimization and the boosting-based approach. Throughout a complete experimental study, our analysis shows the goodness of both schemes in contrast with the unpruned ensembles and the standard pruning metrics in Bagging-based ensembles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barandela, R., Valdovinos, R., Sánchez, J.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)

    Article  MathSciNet  Google Scholar 

  2. Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognit. 36(3), 849–851 (2003)

    Article  Google Scholar 

  3. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)

    Article  Google Scholar 

  4. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  5. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  7. Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for class imbalance problem: bagging, boosting and hybrid based approaches. IEEE Trans. Syst., Man Cybern. Part C: Appl. Rev. 42(4), 463–484 (2012)

    Article  Google Scholar 

  8. Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit. 46(12), 3460–3471 (2013)

    Article  Google Scholar 

  9. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  10. Hernández-Lobato, D., Martínez-Muñoz, G., Suárez, A.: Statistical instance-based pruning in ensembles of independent classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 364–369 (2009)

    Article  Google Scholar 

  11. Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)

    Article  Google Scholar 

  12. Kuncheva, L.I.: Diversity in multiple classifier systems. Inf. Fusion 6(1), 3–4 (2005)

    Article  MathSciNet  Google Scholar 

  13. López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250(20), 113–141 (2013)

    Article  Google Scholar 

  14. López, V., Fernández, A., Herrera, F.: On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed. Inf. Sci. 257, 1–13 (2014)

    Article  Google Scholar 

  15. Martínez-Muñoz, G., Suárez, A.: Using boosting to prune bagging ensembles. Pattern Recognit. Lett. 28(1), 156–165 (2007)

    Article  Google Scholar 

  16. Martínez-Muñoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 245–259 (2009)

    Article  Google Scholar 

  17. Moreno-Torres, J.G., Sáez, J.A., Herrera, F.: Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1304–1313 (2012)

    Article  Google Scholar 

  18. Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–45 (2006)

    Article  Google Scholar 

  19. Prati, R.C., Batista, G.E.A.P.A., Silva, D.F.: Class imbalance revisited: a new experimental setup to assess the performance of treatment methods. Knowledge and Information Systems, 1–25 (2014, in press)

    Google Scholar 

  20. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kauffman, San Mateo (1993)

    Google Scholar 

  21. Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1), 1–39 (2010)

    Article  MathSciNet  Google Scholar 

  22. Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)

    Article  Google Scholar 

  23. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining (CIDM’09), 324–331 (2009)

    Google Scholar 

  24. Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)

    Article  Google Scholar 

  25. Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03), 435–442 (2003)

    Google Scholar 

  26. Zhang, Y., Burer, S., Street, W.N.: Ensemble pruning via semi-definite programming. J. Mach. Learn. Res. 7, 1315–1338 (2006)

    MathSciNet  MATH  Google Scholar 

  27. Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137(1–2), 239–263 (2002)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by the Spanish Ministry of Science and Technology under projects TIN-2011-28488, TIN-2012-33856, TIN2013-40765-P; the Andalusian Research Plans P11-TIC-7765 and P10-TIC-6858; and both the University of Jaén and Caja Rural Provincial de Jaén under project UJA2014/06/15.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mikel Galar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F. (2016). New Ordering-Based Pruning Metrics for Ensembles of Classifiers in Imbalanced Datasets. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26227-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26225-3

  • Online ISBN: 978-3-319-26227-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics