Skip to main content

Clustering-Based Ensemble Pruning in the Imbalanced Data Classification

  • Conference paper
  • First Online:
Computational Science – ICCS 2021 (ICCS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12744))

Included in the following conference series:

  • 1104 Accesses

Abstract

Ensemble methods in combination with data preprocessing techniques are one of the most used approaches to dealing with the problem of imbalanced data classification. At the same time, the literature indicates the potential capability of classifier selection/ensemble pruning methods to deal with imbalance without the use of preprocessing, due to the ability to use expert knowledge of the base models in specific regions of the feature space. The aim of this work is to check whether the use of ensemble pruning algorithms may allow for increasing the ensemble’s ability to detect minority class instances at the level comparable to the methods employing oversampling techniques. Two approaches based on the clustering of base models in the diversity space, proposed by the author in previous articles, were evaluated based on the computer experiments conducted on benchmark datasets with a high Imbalance Ratio. The obtained results and the performed statistical analysis confirm the potential of employing classifier selection methods for the classification of data with the skewed class distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/w4k2/iccs21-ensemble-pruning.

References

  1. Bora, D.J., Gupta, D., Kumar, A.: A comparative study between fuzzy clustering algorithm and hard clustering algorithm. arXiv preprint arXiv:1404.6059 (2014)

  2. Chen, D., Wang, X.-J., Wang, B.: A dynamic decision-making method based on ensemble methods for complex unbalanced data. In: Cheng, R., Mamoulis, N., Sun, Y., Huang, X. (eds.) WISE 2020. LNCS, vol. 11881, pp. 359–372. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34223-4_23

    Chapter  Google Scholar 

  3. Cruz, R.M., Sabourin, R., Cavalcanti, G.D.: Dynamic classifier selection: recent advances and perspectives. Inf. Fusion 41(C), 195–216 (2018)

    Google Scholar 

  4. Qiang, F., Shang-xu, H., Sheng-ying, Z.: Clustering-based selective neural network ensemble. J. Zhejiang Univ. Sci. A 6(5), 387–392 (2005). https://doi.org/10.1631/jzus.2005.A0387

    Article  Google Scholar 

  5. Giacinto, G., Roli, F., Fumera, G.: Design of effective multiple classifier systems by clustering of classifiers. In: 15th International Conference on Pattern Recognition, ICPR 2000 (2000)

    Google Scholar 

  6. Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)

    Article  Google Scholar 

  7. Klikowski, J., Ksieniewicz, P., Woźniak, M.: A genetic-based ensemble learning applied to imbalanced data classification. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) IDEAL 2019. LNCS, vol. 11872, pp. 340–352. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33617-2_35

    Chapter  Google Scholar 

  8. Krawczyk, B., Cyganek, B.: Selecting locally specialised classifiers for one-class classification ensembles. Pattern Anal. Appl. 20(2), 427–439 (2015). https://doi.org/10.1007/s10044-015-0505-z

    Article  MathSciNet  Google Scholar 

  9. Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0

    Article  Google Scholar 

  10. Ksieniewicz, P.: Undersampled majority class ensemble for highly imbalanced binary classification. In: Proceedings of the 2nd International Workshop on Learning with Imbalanced Domains: Theory and Applications. Proceedings of Machine Learning Research, PMLR, ECML-PKDD, Dublin, Ireland, vol. 94, pp. 82–94, 10 September 2018

    Google Scholar 

  11. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)

    Book  Google Scholar 

  12. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

    Article  Google Scholar 

  13. Lazarevic, A., Obradovic, Z.: The effective pruning of neural network classifiers. 2001 IEEE/INNS International Conference on Neural Networks, IJCNN 2001 (2001)

    Google Scholar 

  14. Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the 14th International Conference on Machine Learning, ICML 1997, San Francisco, CA, USA, pp. 211–218. Morgan Kaufmann Publishers Inc. (1997)

    Google Scholar 

  15. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  16. Ruta, D., Gabrys, B.: A theoretical analysis of the limits of majority voting errors for multiple classifier systems. Pattern Anal. Appl. 2(4), 333–350 (2002)

    Article  MathSciNet  Google Scholar 

  17. Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63–81 (2005)

    Article  Google Scholar 

  18. Wojciechowski, S., Woźniak, M.: Employing decision templates to imbalanced data classification. In: de la Cal, E.A., Villar Flecha, J.R., Quintián, H., Corchado, E. (eds.) HAIS 2020. LNCS (LNAI), vol. 12344, pp. 120–131. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61705-9_11

    Chapter  Google Scholar 

  19. Zhang, H., Cao, L.: A spectral clustering based ensemble pruning approach. Neurocomputing 139, 289–297 (2014)

    Article  Google Scholar 

  20. Zhou, Z.H.: Ensemble Methods: Foundations and Algorithms. Chapman & Hall CRC, Boca Raton (2012)

    Book  Google Scholar 

  21. Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137(1–2), 239–263 (2002)

    Article  MathSciNet  Google Scholar 

  22. Zyblewski, P., Woźniak, M.: Clustering-based ensemble pruning and multistage organization using diversity. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds.) HAIS 2019. LNCS (LNAI), vol. 11734, pp. 287–298. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29859-3_25

    Chapter  Google Scholar 

  23. Zyblewski, P., Woźniak, M.: Novel clustering-based pruning algorithms. Pattern Anal. Appl. 23(3), 1049–1058 (2020). https://doi.org/10.1007/s10044-020-00867-8

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment

This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paweł Zyblewski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zyblewski, P. (2021). Clustering-Based Ensemble Pruning in the Imbalanced Data Classification. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77967-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77966-5

  • Online ISBN: 978-3-030-77967-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics