Abstract
Feature selection is a process that selects some important features from original feature set. Many existing feature selection algorithms based on information theory concentrate on maximizing relevance and minimizing redundancy. In this paper, relevance and redundancy are extended to conditional relevance and conditional redundancy. Because of the natures of the two conditional relations, they tend to produce more accurate feature relations. A new frame integrating the two conditional relations is built in this paper and two new feature selection methods are proposed, which are Minimum Conditional Relevance-Minimum Conditional Redundancy (MCRMCR) and Minimum Conditional Relevance-Minimum Intra-Class Redundancy (MCRMICR) respectively. The proposed methods can select high class-relevance and low-redundancy features. Experimental results for twelve datasets verify the proposed methods perform better on feature selection and have high classification accuracy.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Das S (2010) Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the international conference on machine learning, pp 74-81
Zhou HF, Guo J, Wang Y (2016) A feature selection approach based on interclass and intraclass relative contributions of terms. Comput Intell Neurosci 2016(17):1–8
Zhou HF, Guo J, Wang YH (2016) A feature selection approach based on term distributions. SpringerPlus 5(1):1–14
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. JMLR 3(6):1157–1182
Baranauskas JA, Netto SR (2017) A tree-based algorithm for attribute selection. Appl Intell 2017(19):1–13
Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125
Zhou HF, Zhao XH, Wang X (2014) An effective ensemble pruning algorithm based on frequent patterns. Knowl-Based Syst 56(3):79–85
Lewis DD (1992) Feature selection and feature extraction for text categorization. In: Proceedings of The workshop on speech and natural language, Association for computation linguistics Morristown, NJ, USA, pp 212–217
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Vinh LT, Lee S (2012) A novel selection method based on normalized mutual information. Appl Intell 37 (1):100–120
Lin D, Tang X (2006) Conditional infomax learning: An integrated framework for feature extraction and fusion. In: European conference on computer version. pp 68–82
Yang HH, Moody J (1999) Feature selection based on joint mutual information. In: Proceedings of International ICSC symposium on advances in intelligent data analysis. pp 22–25
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
Brown G, Pocock A, Zhao MJ, Lujun M (2012) Conditional likelihood maximization: A unifying framework for information theoretic feature selection. J Mach Learn Res 13(1):27–66
Chen ZJ, Wu CZ, Zhang YS, other (2015) Feature selection with redundancy-complementariness dispersion. Knowl-Based Syst 89(3):203–217
Wang J, Wei JM, Yang Z, other (2017) Feature selection by maximizing independent classification information. IEEE Trans Knowl Data Eng 29(4):828–841
Vinh NX, Zhou S, Chan J, Bailey J (2015) Can high-order dependencies improve mutual information based feature selection Pattern Recogn 53(C):46–58
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Herman G, Zhang B, Wang Y, Ye G, Chen F (2013) Mutual information based method for selecting informative feature sets. Pattern Recogn 46(12):3315–3327
Zhou HF, Zhang YH, Liu YB (2017) A global-relationship dissimilarity measure for the k-modes clustering algorithm. Comput Intell Neurosci 2017:1–7
Li J, Cheng K, Morstatter S (2016) Feature selection: a data perspective. ACM Comput Surv 50 (6):94:1–94:45
Zhou HF, Li J, Li J, other (2017) A graph clustering method for community detection in complex networks. Physica A Statistical Mechanics & Its Applications 469:551–562
Zheng Y, Kwoh CK (2011) A feature subset selection method based on high-dimensional mutual information. Entropy 13(4):860–901
Chow TWS, Huang D (2005) Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information. IEEE Trans Neural Netw 16(1):213–224
Zhou HF, Liu J, Li J, Duan WC (2017) A density-based approach for detecting complexes in weighted PPI networks by semantic similarity. Plos One 12(7):1–14
Vinh NX, Chan J, Bailey J (2014) Reconsidering mutual information based feature selection: A statistical significance view. In: Proceedings of the 80th AAAI conference on artificial intelligence, pp 2092–2098
Acknowledgments
The corresponding author would like to thank the support from the National Natural Science Foundation of China under the Grant of 61402363, the Education Department of Shaanxi Province Key Laboratory Project under the Grant of 15JS079, Xi’an Science Program Project under the Grant of 2017080CG/RC043(XALG017), the Ministry of Education of Shaanxi Province Research Project under the Grant of 17JK0534, and Beilin district of Xi’an Science and Technology Project under the Grant of GX1625.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, H., Zhang, Y., Zhang, Y. et al. Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy. Appl Intell 49, 883–896 (2019). https://doi.org/10.1007/s10489-018-1305-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1305-0