Abstract
In classification, feature selection is an important pre-processing step to simplify the dataset and improve the data representation quality, which makes classifiers become better, easier to train, and understand. Because of an ability to analyse non-linear interactions between features, mutual information has been widely applied to feature selection. Along with counting approaches, a traditional way to calculate mutual information, many mutual information estimations have been proposed to allow mutual information to work directly on continuous datasets. This work focuses on comparing the effect of counting approach and kernel density estimation (KDE) approach in feature selection using particle swarm optimisation as a search mechanism. The experimental results on 15 different datasets show that KDE can work well on both continuous and discrete datasets. In addition, feature subsets evolved by KDE achieves similar or better classification performance than the counting approach. Furthermore, the results on artificial datasets with various interactions show that KDE is able to capture correctly the interaction between features, in both relevance and redundancy, which can not be achieved by using the counting approach.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Data Classif Algorithms Appl 2014:37
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2:37–52
Lee TW (1998) Independent component analysis. Springer, US, pp 27–66
Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 20(9):1100–1103. doi:10.1109/T-C.1971.223410
Marill T, Green DM (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9:11–17
Xue B, Zhang M, Browne W, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput. doi:10.1109/TEVC.2015.250442
Eberhart RC, Shi Y (1998) Comparison between genetic algorithms and particle swarm optimization. In: Porto VW, Saravanan N, Waagen D, Eiben AE (eds) Proceedings of the 7th international conference on evolutionary programming VII. Lecture notes in computer science, vol 1447. Springer, Berlin, Heidelberg, pp 611–616
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York
Dash M, Liu H, Motoda H (2000) Consistency Based Feature Selection. In: Takao T, Liu H, Chen ALP (eds) Knowledge discovery and data mining. current issues and new applications. Lecture notes in computer science, vol 1805. Springer, Berlin, Heidelberg, pp 98–109
Hall M (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of 7th intentional conference on machine learning, Stanford University (2000)
Kononenko I (1995) On biases in estimating multi-valued attributes. IJCAI 95:1034–1040
Walters-Williams J, Li Y (2009) Estimation of mutual information: a survey. In: Wen P, Li Y, Polkowski L, Yao Y, Tsumoto S, Wang G (eds) Rough sets and knowledge technology, Springer, Heidelberg, pp 389–396. doi:10.1007/978-3-642-02962-2_49
Nguyen HB, Xue B, Andreae P (2016) Mutual information estimation for filter based feature selection using particle swarm optimization. In: Applications of evolutionary computation. Springer (2016) 719–736
Kennedy J, Eberhart R et al (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks, vol 4, Perth, Australia, pp 1942–1948
Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106:620
Alfonso L, Lobbrecht A, Price R (2010) Optimization of water level monitoring network in polder systems using information theory. Water Resources Research 46 (2010)
Stearns SD (1976) On selecting features for pattern classifiers. In: Proceedings of the 3rd international conference on pattern recognition (ICPR 1976), Coronado, CA, pp 71–75
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15:1119–1125
Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276
Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34
Vieira SM, Mendonça LF, Farinha GJ, Sousa JM (2013) Modified binary PSO for feature selection using svm applied to mortality prediction of septic patients. Appl Soft Comput 13:3494–3504
Chuang LY, Chang HW, Tu CJ, Yang CH (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32:29–38
Lee S, Soak S, Oh S, Pedrycz W, Jeon M (2008) Modified binary particle swarm optimization. Prog Nat Sci 18:1161–1166
Huang CL, Wang CJ (2006) A ga-based feature selection and parameters optimizationfor support vector machines. Expert Syst Appl 31:231–240
Lane MC, Xue B, Liu I, Zhang M (2013) Particle swarm optimisation and statistical clustering for feature selection. In: AI 2013: advances in artificial intelligence. Springer, pp 214–220
Lane MC, Xue B, Liu I, Zhang M (2014) Gaussian based particle swarm optimisation and statistical clustering for feature selection. In: Evolutionary computation in combinatorial optimisation. Lecture notes in computer science, vol 8600. Springer, Heidelberg, pp 133–144. doi:10.1007/978-3-662-44320-0_12
Nguyen HB, Xue B, Liu I, Zhang M (2014) PSO and statistical clustering for feature selection: a new representation. In: Dick G, Browne WN, Whigham P, Zhang M, Bui LT, Ishibuchi BH, Jin Y, Li X, Shi Y, Singh P, Tan KC, Tang K (eds) Simulated evolution and learning, vol 8886. Springer International Publishing, Heidelberg, pp 569–581. doi:10.1007/978-3-319-13563-2_481
Nguyen HB, Xue B, Liu I, Andreae P, Zhang M (2015) Gaussian transformation based representation in particle swarm optimisation for feature selection. In: Mora AM, Squillero G (eds) Applications of evolutionary computation, vol 9028. Springer International Publishing, pp 541–553. doi:10.1007/978-3-319-16549-3_44
Tran B, Xue B, Zhang M (2014) Improved PSO for feature selection on high-dimensional datasets. In: Dick G, Browne WN, Whigham P, Zhang M, Bui LT, Ishibuchi BH, Jin Y, Li X, Shi Y, Singh P, Tan KC, Tang K (eds) Simulated evolution and learning. Lecture notes in computer science, vol 8886. Springer International Publishing, pp 503–515
Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Rem Sens Lett 12:309–313
Freeman C, Kulić D, Basir O (2015) An evaluation of classifier-specific filter measure performance for feature selection. Pattern Recognit 48:1812–1826
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Estévez PA, Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE Trans Neural Netw 20:189–201
Hoque N, Bhattacharyya D, Kalita JK (2014) Mifs-nd: a mutual information-based feature selection method. Expert Syst Appl 41:6371–6385
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18
Lee J, Kim DW (2015) Mutual information-based multi-label feature selection using interaction information. Expert Syst Appl 42:2013–2025
Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34:349–357
Fang L, Zhao H, Wang P, Yu M, Yan J, Cheng W, Chen P (2015) Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data. Biomed Signal Process Control 21:82–89
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69:066138
Cervante L, Xue B, Zhang M, Shang L (2012) Binary particle swarm optimisation for feature selection: a filter based approach. In: 2012 IEEE congress on evolutionary computation (CEC). IEEE (2012)
Xue B, Cervante L, Shang L, Browne WN, Zhang M (2012) A multi-objective particle swarm optimisation for filter-based feature selection in classification problems. Connect Sci 24:91–116
Nguyen HB, Xue B, Liu I, Zhang M (2014) Filter based backward elimination in wrapper based PSO for feature selection in classification. In: IEEE congress on evolutionary computation (CEC), Beijing, pp 3111–3118. doi:10.1109/CEC.2014.6900657
Sturges HA (1926) The choice of a class interval. J Am Stat Assoc 21:65–66
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 1962:1065–1076
Lizier JT (2014) Jidt: an information-theoretic toolkit for studying the dynamics of complex systems. arXiv preprint arXiv:1408.3270
Asuncion A, Newman D (2007) Uci machine learning repository (2007)
Lungarella M, Pegors T, Bulwinkle D, Sporns O (2005) Methods for quantifying the informational structure of sensory and motor data. Neuroinformatics 3:243–262
Van Den Bergh F (2006) An analysis of particle swarm optimizers. PhD thesis, University of Pretoria (2006)
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43:1656–1671
Eberhart RC, Shi Y (2000) Comparing inertia weights and constriction factors in particle swarm optimization evolutionary computation. In: Proceedings of the 2000 Congress on, La Jolla, CA, vol 1, pp 84–88. doi:10.1109/CEC.2000.870279
Moraglio A, Di Chio C, Poli R (2007) Geometric Particle Swarm Optimisation. In: Ebner M, O’Neill M, Ekárt A, Vanneschi L, Esparcia-Alcázar AI (eds) Genetic Programming, vol 4445. Springer, Berlin, Heidelberg, pp 125–136. doi:10.1007/978-3-540-71605-1_12
Acknowledgments
We thank A/Prof Ivy Liu from School of Mathematics and Statistics,Victoria University of Wellington, who provided insight into ANOVA test and helped us to analyse the experimental results.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nguyen, H.B., Xue, B. & Andreae, P. Mutual information for feature selection: estimation or counting?. Evol. Intel. 9, 95–110 (2016). https://doi.org/10.1007/s12065-016-0143-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-016-0143-4