Abstract
In this work we study how conventional feature selection methods can be applied to Hierarchical Multi-label Classification Problems. In Hierarchical Multi-label Classification, instances can belong to two or more classes (labels) simultaneously, where such classes are hierarchically structured. Feature selection plays an important role in Machine Learning classification tasks, once it can effectively reduce the dataset dimensionality by removing irrelevant and/or redundant features, improving classification accuracy. Although many relevant real-world problems are from the hierarchical and multi-label domains, the majority of the related researches address the feature selection task focusing on single-label problems. In many works, even when the proposal deals with multi-label problems, the classes are not associated with a hierarchical structure. Therefore, in this work we study how feature selection can be applied in the Hierarchical Multi-label Classification context. For this, we propose four hierarchical strategies combining the Binary Relevance (BR) and Label Powerset (LP) multi-label transformations with the attribute evaluators ReliefF (RF) and Information Gain (IG). We tested our strategies on 10 real-world datasets from the functional genomic field, commonly used in Hierarchical Multi-label Classification works. As main results, three of the four proposed strategies produced some relevant subsets of features, while keeping predictive performances in comparison to the use of the complete set of features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amazal, H., Ramdani, M., Kissi, M.: Towards a feature selection for multi-label text classification in Big Data. In: Hamlich, M., Bellatreche, L., Mondal, A., Ordonez, C. (eds.) SADASC 2020. CCIS, vol. 1207, pp. 187–199. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45183-7_14
Cerri, R., Barros, R.C., de Carvalho, A.C., Jin, Y.: Reduction strategies for hierarchical multi-label classification in protein function prediction. BMC Bioinformatics 17(1), 373 (2016)
Clare, A.: Machine learning and data mining for yeast functional genomics. Doctor of Philosophy, Aberystwyth, The University of Wales (2003)
Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32(suppl–1), D258–D261 (2004)
Doquire, G., Verleysen, M.: Feature selection for multi-label classification problems. In: Cabestany, J., Rojas, I., Joya, G. (eds.) IWANN 2011. LNCS, vol. 6691, pp. 9–16. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21501-8_2
Gao, W., Hu, J., Li, Y., Zhang, P.: Feature redundancy based on interaction information for multi-label feature selection. IEEE Access 8, 146050–146064 (2020)
Kashef, S., Nezamabadi-pour, H., Nikpour, B.: Multilabel feature selection: a comprehensive review and guiding experiments. Wiley Interdiscip. Rev: Data Min. Knowl. Discov. 8(2), e1240 (2018)
Liu, C., Ma, Q., Xu, J.: Multi-label feature selection method combining unbiased Hilbert-Schmidt independence criterion with controlled genetic algorithm. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11304, pp. 3–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04212-7_1
Nakano, F.K., Lietaert, M., Vens, C.: Machine learning for discovering missing or wrong protein function annotations. BMC Bioinformatics 20(1), 485 (2019)
Peralta, D., Triguero, I., García, S., Saeys, Y., Benitez, J.M., Herrera, F.: Distributed incremental fingerprint identification with reduced database penetration rate using a hierarchical classification based on feature fusion and selection. Knowl-Based Syst. 126, 91–103 (2017)
Petkovic, M., Dzeroski, S., Kocev, D.: Feature ranking for hierarchical multi-label classification with tree ensemble methods. Acta Polytechnica Hungarica 17(10), 129–148 (2020)
Petković, M., Kocev, D., Džeroski, S.: Feature ranking for multi-target regression. Mach. Learn. 109(6), 1179–1204 (2020). https://doi.org/10.1007/s10994-019-05829-8
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003). https://doi.org/10.1023/A:1025667309714
Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011). https://doi.org/10.1007/s10618-010-0175-9
Slavkov, I., Karcheska, J., Kocev, D., Džeroski, S.: HMC-ReliefF: feature ranking for hierarchical multi-label classification. Comput. Sci. Inf. Syst. 15(1), 187–209 (2018)
Slavkov, I., Karcheska, J., Kocev, D., Kalajdziski, S., Džeroski, S.: ReliefF for hierarchical multi-label classification. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2013. LNCS (LNAI), vol. 8399, pp. 148–161. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08407-7_10
SpolaôR, N., Cherman, E.A., Monard, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theor. Comput. Sci. 292, 135–151 (2013)
Spolaôr, N., Monard, M.C., Tsoumakas, G., Lee, H.D.: A systematic review of multi-label feature selection and a new method based on label construction. Neurocomputing 180, 3–15 (2016)
Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Mach. Learn. 73(2), 185–214 (2008). https://doi.org/10.1007/s10994-008-5077-3
Wehrmann, J., Cerri, R., Barros, R.: Hierarchical multi-label classification networks. In: Proceedings of Machine Learning Research, vol. 80, pp. 5075–5084 (2018)
Wei, L., Wan, S., Guo, J., Wong, K.K.: A novel hierarchical selective ensemble classifier with bioinformatics application. Artif. Intell. Med. 83, 82–90 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
da Silva, L.V.M., Cerri, R. (2021). Feature Selection for Hierarchical Multi-label Classification. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science(), vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-74251-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74250-8
Online ISBN: 978-3-030-74251-5
eBook Packages: Computer ScienceComputer Science (R0)