Abstract
In this paper, we aim to improve the classification performance in imbalance data by mitigating the impact of the curse of dimensionality especially in minority classes of a few samples. We exploit a class hierarchy realized as a binary tree whose node has a subset of classes. We construct such a binary tree in a top-down way by taking into consideration the separability of classes and the size of the feature subset. It is expected that the generalization performance is improved, especially in minority classes having a small number of samples, and that the interpretability of the decision rule is enhanced by the smallness of the number of features. Experimental results showed a remarkable improvement is by the proposed method in large-scale problems with many classes, e.g. from 48% to 62% in the balanced accuracy. In addition, only one feature was chosen in every node of the class hierarchy in all the four datasets, bringing a high interpretability of the classification rules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6–5, 429–449 (2002)
Chawla, N.V., et al.: SMOTE: synthetic minority over sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Gosain, A., Sardana, S.: Handling class imbalance problem using oversampling techniques: a review. In: Proceedings of IEEE 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India (2017)
Kuo, F., Sloan, L.: Lifting the Curse of Dimensionality. American Mathematical Society, US (2005)
Lorena, A., Carvalho, A.: Building binary-tree-based multiclass classifiers using separability measures. Neurocomputing 73, 2837–2845 (2010)
Aoki, K., Watanabe, T., Kudo, M.: Design of decision tree using class-dependent feature subsets. Trans. Inst. Electron. Inf. Commun. Eng. J86-D2(8), 1156–1165 (2003)
Aoki, K., Kudo, M.: A top-down construction of class decision trees with selected features and classifiers. In: Proceedings of the 2010 International Conference on High Performance Computing and Simulation (HPCS 2010), Caen, France, pp. 390–398 (2010)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 6(1), 1119–1125 (1994)
Dua, D., Graff, C.: UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. University of California, School of Information and Computer Science, Irvine, CA (2019)
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: MULAN: a Java library for multi-label learning. J. Mach. Learn. Res. 12, 2411–2414 (2011)
Acknowledgment
This work was partially supported by JSPS KAKENHI Grant Number 19H04128.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Horio, T., Kudo, M. (2021). Feature Selection with Class Hierarchy for Imbalance Problems. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2021. Lecture Notes in Computer Science(), vol 13055. Springer, Cham. https://doi.org/10.1007/978-3-030-89691-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-89691-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89690-4
Online ISBN: 978-3-030-89691-1
eBook Packages: Computer ScienceComputer Science (R0)