Abstract
Ensemble methods in combination with data preprocessing techniques are one of the most used approaches to dealing with the problem of imbalanced data classification. At the same time, the literature indicates the potential capability of classifier selection/ensemble pruning methods to deal with imbalance without the use of preprocessing, due to the ability to use expert knowledge of the base models in specific regions of the feature space. The aim of this work is to check whether the use of ensemble pruning algorithms may allow for increasing the ensemble’s ability to detect minority class instances at the level comparable to the methods employing oversampling techniques. Two approaches based on the clustering of base models in the diversity space, proposed by the author in previous articles, were evaluated based on the computer experiments conducted on benchmark datasets with a high Imbalance Ratio. The obtained results and the performed statistical analysis confirm the potential of employing classifier selection methods for the classification of data with the skewed class distribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bora, D.J., Gupta, D., Kumar, A.: A comparative study between fuzzy clustering algorithm and hard clustering algorithm. arXiv preprint arXiv:1404.6059 (2014)
Chen, D., Wang, X.-J., Wang, B.: A dynamic decision-making method based on ensemble methods for complex unbalanced data. In: Cheng, R., Mamoulis, N., Sun, Y., Huang, X. (eds.) WISE 2020. LNCS, vol. 11881, pp. 359–372. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34223-4_23
Cruz, R.M., Sabourin, R., Cavalcanti, G.D.: Dynamic classifier selection: recent advances and perspectives. Inf. Fusion 41(C), 195–216 (2018)
Qiang, F., Shang-xu, H., Sheng-ying, Z.: Clustering-based selective neural network ensemble. J. Zhejiang Univ. Sci. A 6(5), 387–392 (2005). https://doi.org/10.1631/jzus.2005.A0387
Giacinto, G., Roli, F., Fumera, G.: Design of effective multiple classifier systems by clustering of classifiers. In: 15th International Conference on Pattern Recognition, ICPR 2000 (2000)
Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)
Klikowski, J., Ksieniewicz, P., Woźniak, M.: A genetic-based ensemble learning applied to imbalanced data classification. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) IDEAL 2019. LNCS, vol. 11872, pp. 340–352. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33617-2_35
Krawczyk, B., Cyganek, B.: Selecting locally specialised classifiers for one-class classification ensembles. Pattern Anal. Appl. 20(2), 427–439 (2015). https://doi.org/10.1007/s10044-015-0505-z
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0
Ksieniewicz, P.: Undersampled majority class ensemble for highly imbalanced binary classification. In: Proceedings of the 2nd International Workshop on Learning with Imbalanced Domains: Theory and Applications. Proceedings of Machine Learning Research, PMLR, ECML-PKDD, Dublin, Ireland, vol. 94, pp. 82–94, 10 September 2018
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)
Lazarevic, A., Obradovic, Z.: The effective pruning of neural network classifiers. 2001 IEEE/INNS International Conference on Neural Networks, IJCNN 2001 (2001)
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the 14th International Conference on Machine Learning, ICML 1997, San Francisco, CA, USA, pp. 211–218. Morgan Kaufmann Publishers Inc. (1997)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Ruta, D., Gabrys, B.: A theoretical analysis of the limits of majority voting errors for multiple classifier systems. Pattern Anal. Appl. 2(4), 333–350 (2002)
Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63–81 (2005)
Wojciechowski, S., Woźniak, M.: Employing decision templates to imbalanced data classification. In: de la Cal, E.A., Villar Flecha, J.R., Quintián, H., Corchado, E. (eds.) HAIS 2020. LNCS (LNAI), vol. 12344, pp. 120–131. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61705-9_11
Zhang, H., Cao, L.: A spectral clustering based ensemble pruning approach. Neurocomputing 139, 289–297 (2014)
Zhou, Z.H.: Ensemble Methods: Foundations and Algorithms. Chapman & Hall CRC, Boca Raton (2012)
Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137(1–2), 239–263 (2002)
Zyblewski, P., Woźniak, M.: Clustering-based ensemble pruning and multistage organization using diversity. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds.) HAIS 2019. LNCS (LNAI), vol. 11734, pp. 287–298. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29859-3_25
Zyblewski, P., Woźniak, M.: Novel clustering-based pruning algorithms. Pattern Anal. Appl. 23(3), 1049–1058 (2020). https://doi.org/10.1007/s10044-020-00867-8
Acknowledgment
This work was supported by the Polish National Science Centre under the grant No. 2017/27/B/ST6/01325.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zyblewski, P. (2021). Clustering-Based Ensemble Pruning in the Imbalanced Data Classification. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-77967-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77966-5
Online ISBN: 978-3-030-77967-2
eBook Packages: Computer ScienceComputer Science (R0)