Skip to main content

Research on Feature Selection Methods Based on Feature Clustering and Information Theory

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14874))

Included in the following conference series:

  • 526 Accesses

Abstract

In order to identify the most representative subset of features in high-dimensional data, a feature selection algorithm (AP-MSU) based on feature clustering and information theory is proposed. The algorithm introduces the AP clustering algorithm and multivariate symmetric uncertainty (MSU) based on the filtering feature selection algorithm’s preliminary screening of relevant features, better demonstrating the interactions between multiple feature variables and their interactions with target variables. The features are evaluated sequentially by an MSU-based feature quality metric, which considers both redundancy and interaction among the candidate features in the selected feature set, and removes the redundant features by assessing the ability of the features to provide effective categorization information with a small amount of computation. The experimental results show that the AP-MSU feature selection algorithm can effectively select a good feature set on binary and multi-classified gene expression datasets, and has good classification effect on different classifiers. In addition, the classification accuracy can be improved by the algorithm obtained a lower dimensional subset of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yan, X.Y., Nazmi, S., Erol, B.A., et al.: An efficient unsupervised feature selection procedure through feature clustering. Pattern Recogn. Lett. 131, 227–284 (2020)

    Article  Google Scholar 

  2. Osama, S., Shaban, H., Ali, A.A.: Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: a comprehensive review. Expert Syst. Appl. 213, 118946 (2023)

    Article  Google Scholar 

  3. Lin, X.H., Li, C., Ren, W.J., Luo, X., Qi, Y.P.: A new feature selection method based on symmetrical uncertainty and interaction gain. Comput. Biol. Chem. 83, 107149 (2019)

    Article  MathSciNet  Google Scholar 

  4. Xu, J., Tang, B., He, H.B., et al.: Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans. Neural Networks Learn. Syst. 28(9), 1974–1984 (2017)

    Article  MathSciNet  Google Scholar 

  5. Gao, W.F., Hu, L., Zhang, P.: Feature redundancy term variation for mutual information-based feature selection. Appl. Intell. 50(8), 1272–1288 (2020)

    Article  Google Scholar 

  6. Dai, J.H., Chen, J.L., Liu, Y., et al.: Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation. Knowl.-Based Syst. 207, 106342 (2020)

    Article  Google Scholar 

  7. Wan, J.H., Chen, H.M., Yuan, Z., et al.: A novel hybrid feature selection method considering feature interaction in neighborhood rough set. Knowl.-Based Syst. 227, 107167 (2021)

    Article  Google Scholar 

  8. Wang, W.J., Guo, M., Han, T.T., et al.: A novel feature selection method considering feature interaction in neighborhood rough set. Intell. Data Anal. 27(2), 345–359 (2023)

    Article  Google Scholar 

  9. Rahmanian, M., Mansoori, E.G.: Unsupervised fuzzy multivariate symmetric uncertainty feature selection based on constructing virtual cluster representative. Fuzzy Sets Syst. 438, 148–163 (2022)

    Article  MathSciNet  Google Scholar 

  10. Sosa-Cabrera, G., García-Torres, M., Gómez-Guerrero, S., et al.: A multivariate approach to the symmetrical uncertainty measure: application to feature selection problem. Inf. Sci. 494, 1–20 (2019)

    Article  MathSciNet  Google Scholar 

  11. Gómez-Guerrero, S., Ortiz, I., Sosa-Cabrera, G., et al.: Measuring interactions in categorical datasets using multivariate symmetrical uncertainty. Entropy 24(1), 64 (2022)

    Article  MathSciNet  Google Scholar 

  12. Frey, B.J., Ducek, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  Google Scholar 

  13. Cancer program datasets [DS/OL]. https://portal.gdc.cancer.gov/

  14. Alon, U., Notterman, D.A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  15. Hoshida, Y.J., Brunet, J.-P., Tamayo, P., et al.: Subclass mapping: identifying common subtypes in independent disease data sets. PLoS ONE 2(11), e1195 (2007)

    Article  Google Scholar 

  16. Gao, J.R., Wang, Z.Q., Jin, T., et al.: Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection. Knowl.-Based Syst. 286, 111380 (2024)

    Article  Google Scholar 

  17. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. Lect. Notes Comput. Sci. 784(1), 171–182 (1994)

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China-Joint Fund for Enterprises-Key Support Program Project, under Grant U22B2049.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Changyin Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, W., Zhou, C. (2024). Research on Feature Selection Methods Based on Feature Clustering and Information Theory. In: Huang, DS., Chen, W., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14874. Springer, Singapore. https://doi.org/10.1007/978-981-97-5618-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5618-6_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5617-9

  • Online ISBN: 978-981-97-5618-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics