Feature Selection Method Based on Differential Correlation Information Entropy

Wang, Xiujuan; Yan, Yixuan; Ma, Xiaoyue

doi:10.1007/s11063-020-10307-7

Feature Selection Method Based on Differential Correlation Information Entropy

Published: 01 August 2020

Volume 52, pages 1339–1358, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

564 Accesses
12 Citations
Explore all metrics

Abstract

Feature selection is one of the major aspects of pattern classification systems. In previous studies, Ding and Peng recognized the importance of feature selection and proposed a minimum redundancy feature selection method to minimize redundant features for sequential selection in microarray gene expression data. However, since the minimum redundancy feature selection method is used mainly to measure the dependency between random variables of mutual information, the results cannot be optimal without consideration of global feature selection. Therefore, based on the framework of minimum redundancy-maximum correlation, this paper introduces entropy to measure global feature selection and proposes a new feature subset evaluation method, differential correlation information entropy. In our function, different bivariate correlation metrics are selected. Then, the feature selection is completed through sequence forward search. Two different classification models are used on eleven standard data sets of the UCI machine learning knowledge base to compare various comparison algorithms, such as mRMR, reliefF and feature selection method with joint maximal information entropy, with our method. The experimental results show that feature selection based on our proposed method is obviously superior to that of other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Article Open access 02 January 2020

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

A review of unsupervised feature selection methods

Article 29 January 2019

References

Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the 2003 IEEE bioinformatics conference, CSB 2003 pp 523–528. https://doi.org/10.1109/CSB.2003.1227396
Soltani M, Shammakhi MH, Khorram S, Sheikhzadeh H (2016) Combined mRMR filter and sparse Bayesian classifier for analysis of gene expression data. In: Proceedings—2016 2nd international conference of signal processing and intelligent systems. ICSPIS 2016 https://doi.org/10.1109/ICSPIS.2016.7869891
Hanchuan P, Fuhui L, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226. https://doi.org/10.1109/TPAMI.2005.159
Article Google Scholar
Gu X, Guo J, Xiao L, Ming T, Li C (2019) A feature selection algorithm based on equal interval division and minimal-redundancy-maximal-relevance. Neural Process Lett. https://doi.org/10.1007/s11063-019-10144-3
Article Google Scholar
Zheng K, Wang X (2018) Feature selection method with joint maximal information entropy between features and class. Pattern Recogn 77:20–29. https://doi.org/10.1016/j.patcog.2017.12.008
Article Google Scholar
Zheng K, Wang X, Wu B, Wu T (2020) Feature subset selection combining maximal information entropy and maximal information coefficient. Appl Intell 50(2):487–501. https://doi.org/10.1007/s10489-019-01537-x
Article Google Scholar
Breiman L (2001) Statistical modeling: The two cultures. Stat Sci 16(3):199–215. https://doi.org/10.1214/ss/1009213726
Article MathSciNet MATH Google Scholar
Chen G, Chen J (2015) A novel wrapper method for feature selection and its applications. Neurocomputing 159:219–226. https://doi.org/10.1016/j.neucom.2015.01.070
Article Google Scholar
Maldonado S, Weber R (2009) A wrapper method for feature selection using support vector machines. Inf Sci 179(13):2208–2217. https://doi.org/10.1016/j.ins.2009.02.014
Article Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324. https://doi.org/10.1016/s0004-3702(97)00043-x
Article MATH Google Scholar
Guyon I, Elisseefl A (2006) An introduction to feature extraction. In: Studies in fuzziness and soft computing, vol 207, pp 1–25
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245
Article MathSciNet Google Scholar
Caropreso MF, Matwin S, Sebastiani F (2001) A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization (IGI Global), pp 78–102
Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and Naive Bayes. In: international conference on machine learning, pp 258–267
Battiti R (1994) Using mutual information for selecting features in supervised neural-net learning. IEEE Trans Neural Netw 5(4):537–550. https://doi.org/10.1109/72.298224
Article Google Scholar
Yang HH, Moody J (2000) Data visualization and feature selection: new algorithms for non-Gaussian data. In: Advances in neural information processing systems. pp 687–693
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
MathSciNet MATH Google Scholar
Jakulin A (2005) Machine learning based on attribute interactions, Ph.D. thesis
Meyer PE, Bontempi G (2006) On the use of variable complementarity for feature selection in cancer classification, Lecture Notes in Computer Science, Springer, Berlin, Berlin, vol 3907, pp 91–102
Cadenas JM, Garrido MC, Martinez R (2013) Feature subset selection Filter–Wrapper based on low quality data. Expert Syst Appl 40(16):6241–6252. https://doi.org/10.1016/j.eswa.2013.05.051
Article Google Scholar
Liu Y, Zheng YF (2006) \(\text{ FS}_{{\rm SFS}}\): a novel feature selection method for support vector machines. Pattern Recogn 39(7):1333–1345. https://doi.org/10.1016/j.patcog.2005.10.006
Article MATH Google Scholar
Chyzhyk D, Savio A, Grana M (2014) Evolutionary ELM wrapper feature selection for Alzheimer’s disease CAD on anatomical brain MRI. Neurocomputing 128:73–80. https://doi.org/10.1016/j.neucom.2013.01.065
Article Google Scholar
Erguzel T, Tas C, Cebi M (2015) A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2015.06.021
Article Google Scholar
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524. https://doi.org/10.1126/science.1205438
Article MATH Google Scholar
Bache K, Lichman M (2013) http://archive.ics.uci.edu/ml
Roffo G, Melzi S, Castellani U, Vinciarelli A (2017) Infinite latent feature selection: a probabilistic latent graph-based ranking approach. Comput Vis Pattern Recogn https://doi.org/10.1109/ICCV.2017.156
Roffo G, Melzi S, Cristani M (2015) In: IEEE international conference on computer vision (ICCV). pp 4202–4210. https://doi.org/10.1109/ICCV.2015.478
Roffo G, Melzi S (2017) Ranking to learn: feature ranking and selection via eigenvector centrality. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 10312 LNCS, pp 19–35. https://doi.org/10.1007/978-3-319-61461-8_2
Kang S, Ko Y, Seo J (2013) Hierarchical speech-act classification for discourse analysis. Pattern Recognit Lett 34(10):1119–1124. https://doi.org/10.1016/j.patrec.2013.03.008
Article Google Scholar
Roffo G (2017) Computer Vision and Pattern Recognition. arXiv
Kira K, Rendell LA (1992) Feature selection problem: traditional methods and a new algorithm. In: Proceedings tenth national conference on artificial intelligence pp 129–134
Liu C, Wang W, Zhao Q, Shen X, Konan M (2017) A new feature selection method based on a validity index of feature subset. Pattern Recogn Lett 92:1–8. https://doi.org/10.1016/j.patrec.2017.03.018
Article Google Scholar

Download references

Acknowledgements

This work was financially supported by the National Key R&D Program of China (Grant No. 2017YFB0802803), the Beijing Natural Science Foundation (4202002) and National College Students Innovation and Entrepreneurship Training Program in BJUT (GJDC-2020-01-09).

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Xiujuan Wang, Yixuan Yan & Xiaoyue Ma

Authors

Xiujuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yixuan Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyue Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yixuan Yan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Yan, Y. & Ma, X. Feature Selection Method Based on Differential Correlation Information Entropy. Neural Process Lett 52, 1339–1358 (2020). https://doi.org/10.1007/s11063-020-10307-7

Download citation

Published: 01 August 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11063-020-10307-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Selection Method Based on Differential Correlation Information Entropy

Abstract

Access this article

Similar content being viewed by others

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature Selection Method Based on Differential Correlation Information Entropy

Abstract

Access this article

Similar content being viewed by others

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation