Research on Feature Selection Methods Based on Feature Clustering and Information Theory

Wang, Wenhui; Zhou, Changyin

doi:10.1007/978-981-97-5618-6_7

Wenhui Wang¹⁰ &
Changyin Zhou¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14874))

Included in the following conference series:

International Conference on Intelligent Computing

526 Accesses

Abstract

In order to identify the most representative subset of features in high-dimensional data, a feature selection algorithm (AP-MSU) based on feature clustering and information theory is proposed. The algorithm introduces the AP clustering algorithm and multivariate symmetric uncertainty (MSU) based on the filtering feature selection algorithm’s preliminary screening of relevant features, better demonstrating the interactions between multiple feature variables and their interactions with target variables. The features are evaluated sequentially by an MSU-based feature quality metric, which considers both redundancy and interaction among the candidate features in the selected feature set, and removes the redundant features by assessing the ability of the features to provide effective categorization information with a small amount of computation. The experimental results show that the AP-MSU feature selection algorithm can effectively select a good feature set on binary and multi-classified gene expression datasets, and has good classification effect on different classifiers. In addition, the classification accuracy can be improved by the algorithm obtained a lower dimensional subset of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Feature Selection Method Based on Differential Correlation Information Entropy

Article 01 August 2020

Feature redundancy term variation for mutual information-based feature selection

Article 10 January 2020

Feature Selection Based on Data Clustering

References

Yan, X.Y., Nazmi, S., Erol, B.A., et al.: An efficient unsupervised feature selection procedure through feature clustering. Pattern Recogn. Lett. 131, 227–284 (2020)
Article Google Scholar
Osama, S., Shaban, H., Ali, A.A.: Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: a comprehensive review. Expert Syst. Appl. 213, 118946 (2023)
Article Google Scholar
Lin, X.H., Li, C., Ren, W.J., Luo, X., Qi, Y.P.: A new feature selection method based on symmetrical uncertainty and interaction gain. Comput. Biol. Chem. 83, 107149 (2019)
Article MathSciNet Google Scholar
Xu, J., Tang, B., He, H.B., et al.: Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans. Neural Networks Learn. Syst. 28(9), 1974–1984 (2017)
Article MathSciNet Google Scholar
Gao, W.F., Hu, L., Zhang, P.: Feature redundancy term variation for mutual information-based feature selection. Appl. Intell. 50(8), 1272–1288 (2020)
Article Google Scholar
Dai, J.H., Chen, J.L., Liu, Y., et al.: Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation. Knowl.-Based Syst. 207, 106342 (2020)
Article Google Scholar
Wan, J.H., Chen, H.M., Yuan, Z., et al.: A novel hybrid feature selection method considering feature interaction in neighborhood rough set. Knowl.-Based Syst. 227, 107167 (2021)
Article Google Scholar
Wang, W.J., Guo, M., Han, T.T., et al.: A novel feature selection method considering feature interaction in neighborhood rough set. Intell. Data Anal. 27(2), 345–359 (2023)
Article Google Scholar
Rahmanian, M., Mansoori, E.G.: Unsupervised fuzzy multivariate symmetric uncertainty feature selection based on constructing virtual cluster representative. Fuzzy Sets Syst. 438, 148–163 (2022)
Article MathSciNet Google Scholar
Sosa-Cabrera, G., García-Torres, M., Gómez-Guerrero, S., et al.: A multivariate approach to the symmetrical uncertainty measure: application to feature selection problem. Inf. Sci. 494, 1–20 (2019)
Article MathSciNet Google Scholar
Gómez-Guerrero, S., Ortiz, I., Sosa-Cabrera, G., et al.: Measuring interactions in categorical datasets using multivariate symmetrical uncertainty. Entropy 24(1), 64 (2022)
Article MathSciNet Google Scholar
Frey, B.J., Ducek, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet Google Scholar
Cancer program datasets [DS/OL]. https://portal.gdc.cancer.gov/
Alon, U., Notterman, D.A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Article Google Scholar
Hoshida, Y.J., Brunet, J.-P., Tamayo, P., et al.: Subclass mapping: identifying common subtypes in independent disease data sets. PLoS ONE 2(11), e1195 (2007)
Article Google Scholar
Gao, J.R., Wang, Z.Q., Jin, T., et al.: Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection. Knowl.-Based Syst. 286, 111380 (2024)
Article Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. Lect. Notes Comput. Sci. 784(1), 171–182 (1994)
Article Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China-Joint Fund for Enterprises-Key Support Program Project, under Grant U22B2049.

Author information

Authors and Affiliations

College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, 266590, China
Wenhui Wang & Changyin Zhou

Authors

Wenhui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Changyin Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changyin Zhou .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Ningbo, China
De-Shuang Huang
China University of Mining and Technology, Xuzhou, China
Wei Chen
Eastern Institute of Technology, Ningbo, China
Qinhu Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, W., Zhou, C. (2024). Research on Feature Selection Methods Based on Feature Clustering and Information Theory. In: Huang, DS., Chen, W., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14874. Springer, Singapore. https://doi.org/10.1007/978-981-97-5618-6_7

Download citation

DOI: https://doi.org/10.1007/978-981-97-5618-6_7
Published: 01 August 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5617-9
Online ISBN: 978-981-97-5618-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics