Abstract
In order to identify the most representative subset of features in high-dimensional data, a feature selection algorithm (AP-MSU) based on feature clustering and information theory is proposed. The algorithm introduces the AP clustering algorithm and multivariate symmetric uncertainty (MSU) based on the filtering feature selection algorithm’s preliminary screening of relevant features, better demonstrating the interactions between multiple feature variables and their interactions with target variables. The features are evaluated sequentially by an MSU-based feature quality metric, which considers both redundancy and interaction among the candidate features in the selected feature set, and removes the redundant features by assessing the ability of the features to provide effective categorization information with a small amount of computation. The experimental results show that the AP-MSU feature selection algorithm can effectively select a good feature set on binary and multi-classified gene expression datasets, and has good classification effect on different classifiers. In addition, the classification accuracy can be improved by the algorithm obtained a lower dimensional subset of features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yan, X.Y., Nazmi, S., Erol, B.A., et al.: An efficient unsupervised feature selection procedure through feature clustering. Pattern Recogn. Lett. 131, 227–284 (2020)
Osama, S., Shaban, H., Ali, A.A.: Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: a comprehensive review. Expert Syst. Appl. 213, 118946 (2023)
Lin, X.H., Li, C., Ren, W.J., Luo, X., Qi, Y.P.: A new feature selection method based on symmetrical uncertainty and interaction gain. Comput. Biol. Chem. 83, 107149 (2019)
Xu, J., Tang, B., He, H.B., et al.: Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans. Neural Networks Learn. Syst. 28(9), 1974–1984 (2017)
Gao, W.F., Hu, L., Zhang, P.: Feature redundancy term variation for mutual information-based feature selection. Appl. Intell. 50(8), 1272–1288 (2020)
Dai, J.H., Chen, J.L., Liu, Y., et al.: Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation. Knowl.-Based Syst. 207, 106342 (2020)
Wan, J.H., Chen, H.M., Yuan, Z., et al.: A novel hybrid feature selection method considering feature interaction in neighborhood rough set. Knowl.-Based Syst. 227, 107167 (2021)
Wang, W.J., Guo, M., Han, T.T., et al.: A novel feature selection method considering feature interaction in neighborhood rough set. Intell. Data Anal. 27(2), 345–359 (2023)
Rahmanian, M., Mansoori, E.G.: Unsupervised fuzzy multivariate symmetric uncertainty feature selection based on constructing virtual cluster representative. Fuzzy Sets Syst. 438, 148–163 (2022)
Sosa-Cabrera, G., GarcÃa-Torres, M., Gómez-Guerrero, S., et al.: A multivariate approach to the symmetrical uncertainty measure: application to feature selection problem. Inf. Sci. 494, 1–20 (2019)
Gómez-Guerrero, S., Ortiz, I., Sosa-Cabrera, G., et al.: Measuring interactions in categorical datasets using multivariate symmetrical uncertainty. Entropy 24(1), 64 (2022)
Frey, B.J., Ducek, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Cancer program datasets [DS/OL]. https://portal.gdc.cancer.gov/
Alon, U., Notterman, D.A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Hoshida, Y.J., Brunet, J.-P., Tamayo, P., et al.: Subclass mapping: identifying common subtypes in independent disease data sets. PLoS ONE 2(11), e1195 (2007)
Gao, J.R., Wang, Z.Q., Jin, T., et al.: Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection. Knowl.-Based Syst. 286, 111380 (2024)
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. Lect. Notes Comput. Sci. 784(1), 171–182 (1994)
Acknowledgment
This work was supported by the National Natural Science Foundation of China-Joint Fund for Enterprises-Key Support Program Project, under Grant U22B2049.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, W., Zhou, C. (2024). Research on Feature Selection Methods Based on Feature Clustering and Information Theory. In: Huang, DS., Chen, W., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14874. Springer, Singapore. https://doi.org/10.1007/978-981-97-5618-6_7
Download citation
DOI: https://doi.org/10.1007/978-981-97-5618-6_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5617-9
Online ISBN: 978-981-97-5618-6
eBook Packages: Computer ScienceComputer Science (R0)