Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain

Susan, Seba; Hanmandlu, Madasu

doi:10.1007/s11045-018-0612-2

Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain

Published: 20 August 2018

Volume 30, pages 1469–1488, (2019)
Cite this article

Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Seba Susan¹ &
Madasu Hanmandlu²

290 Accesses
6 Citations
Explore all metrics

Abstract

A new filter method is proposed for feature selection and ranking that incorporates a novel mutual information with Gaussian gain for evaluating the relationships between features and the class, and in-between features. The new mutual information is derived as per the axioms of classical information theory from the recently introduced non-extensive entropy with Gaussian gain. The characteristic of this entropy is its non-linearity when representing correlated information in natural texture images represented by sparse probability distributions. In this work, we trace this property in our new mutual information in the context of correlated random variables associated with real-world datasets. The non-linearity of the Gaussian function embedded in the mutual information formula is utilized for identifying the most important features in the correct order of rank, right at the outset of the incremental feature selection algorithm. This leads to formation of smaller groups of ranked feature subsets that give the highest classification accuracies. Extensive experimentation on twenty benchmark datasets from the UCI repository along with comparison to the state-of-the-art confirms the efficacy of our approach. An automated optimum feature subset selection is also proposed based on a simple statistical test on the new measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?

A Feature Selection Method Using Conditional Correlation Dispersion and Redundancy Analysis

Article 31 March 2023

Extending greedy feature selection algorithms to multiple solutions

Article Open access 01 May 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Akande, K. O., Owolabi, T. O., & Olatunji, S. O. (2015). Investigating the effect of correlation-based feature selection on the performance of support vector machines in reservoir characterization. Journal of Natural Gas Science and Engineering, 22, 515–522.
Article Google Scholar
Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Network, 5(4), 537–550.
Article Google Scholar
Ben-Bassat, M. (1982). 35 Use of distance measures, information measures and error bounds in feature evaluation. Handbook of Statistics, 2, 773–791.
Article MATH Google Scholar
Bolón-Canedo, V., Alonso-Betanzos, A., & Sánchez-Maroño, N. (2017). Artificial intelligence: Foundations, theory, and algorithms feature selection for high-dimensional data. New ork: Springer.
Google Scholar
Bostani, H., & Sheikhan, M. (2017). Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems. Soft Computing, 21(9), 2307–2324.
Article Google Scholar
Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3), 462–467.
Article MathSciNet MATH Google Scholar
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.
Book MATH Google Scholar
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
MathSciNet MATH Google Scholar
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification (2nd ed.). New York: Wiley.
MATH Google Scholar
Eriksson, T., Kim, S., Kang, H.-G., & Lee, C. (2005). An information-theoretic perspective on feature selection in speaker recognition. IEEE Signal Processing Letters, 12(7), 500–503.
Article Google Scholar
Estevez, P. A., Tesmer, M., Perez, C. A., & Zurada, J. M. (2009). Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 20(2), 189–201.
Article Google Scholar
Frank, A., & Asuncion, A. (2010). UCI machine learning repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.
Hoque, N., Bhattacharyya, D. K., & Kalita, J. K. (2014). MIFS-ND: A mutual information-based feature selection method. Expert Systems with Applications, 41(14), 6371–6385.
Article Google Scholar
Jurado, S., Nebot, À., Mugica, F., & Avellana, N. (2015). Hybrid methodologies for electricity load forecasting: Entropy-based feature selection with machine learning and soft computing techniques. Energy, 86, 276–291.
Article Google Scholar
Khoshgoftaar, T. M., Gao, K., Napolitano, A., & Wald, R. (2014). A comparative study of iterative and non-iterative feature selection techniques for software defect prediction. Information Systems Frontiers, 16(5), 801–822.
Article Google Scholar
Khushaba, R. N., Al-Ani, A., & Al-Jumaily, A. (2011). Feature subset selection using differential evolution and a statistical repair mechanism. Expert Systems with Applications, 38(9), 11515–11526.
Article Google Scholar
Kira, K., & Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new algorithm. In AAAI (Vol. 2, pp. 129–134).
Lee, S., Park, Y.-T., & d’Auriol, B. J. (2012). A novel feature selection method based on normalized mutual information. Applied Intelligence, 37(1), 100–120.
Article Google Scholar
Lisker, L. (1957). Minimal cues for separating/w, r, l, y/in intervocalic position. Word, 13(2), 256–267.
Article Google Scholar
Liu, H., & Lei, Yu. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.
Article Google Scholar
Mafarja, M., Aljarah, I., Heidari, A. A., Hammouri, A. I., Faris, H., Ala’M, A. Z., et al. (2018). Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems. Knowledge-Based Systems, 145, 25–45.
Article Google Scholar
Momma, M., & Bennett, K. P. (2002). A pattern search method for model selection of support vector regression. In SDM (Vol. 132, pp. 261–274).
Pascoal, C., Oliveira, M. R., Pacheco, A., & Valadas, R. (2017). Theoretical evaluation of feature selection methods based on mutual information. Neurocomputing, 226, 168–181.
Article Google Scholar
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual Information: Criteria of max-dependency, max-relevance and min redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.
Article Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Google Scholar
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423.
Article MathSciNet MATH Google Scholar
Song, Q., Ni, J., & Wang, G. (2013). A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering, 25(1), 1–14.
Article Google Scholar
Susan, S., & Dwivedi, M. (2014). Dynamic growth of hidden-layer neurons using the non-extensive entropy. In 2014 Fourth international conference on communication systems and network technologies (CSNT) (pp. 491–495). IEEE, 2014.
Susan, S., & Hanmandlu, M. (2013a). A non-extensive entropy feature and its application to texture classification. Neurocomputing, 120, 214–225.
Article Google Scholar
Susan, S., & Hanmandlu, M. (2013b). A novel fuzzy entropy based on the non-extensive entropy and its application for feature selection. In 2013 IEEE international conference on fuzzy systems (FUZZ) (pp. 1–8). IEEE.
Susan, S., & Hanmandlu, M. (2015). Unsupervised detection of nonlinearity in motion using weighted average of non-extensive entropies. Signal, Image and Video Processing, 9(3), 511–525.
Article Google Scholar
Susan, S., & Kumar, A. (2016). Auto-segmentation using mean-shift and entropy analysis. In 2016 3rd international conference on computing for sustainable global development (INDIACom) (pp. 292–296). IEEE, 2016.
Susan, S., & Sharma, M. (2017). Automatic texture defect detection using Gaussian mixture entropy modeling. Neurocomputing, 239, 232–237.
Article Google Scholar
Tang, B., He, H., Baggenstoss, P. M., & Kay, S. (2016). A Bayesian classification approach using class-specific features for text categorization. IEEE Transactions on Knowledge and Data Engineering, 28(6), 1602–1606.
Article Google Scholar
Tang, J., & Liu, H. (2014). An unsupervised feature selection framework for social media data. IEEE Transactions on Knowledge and Data Engineering, 26(12), 2914–2927.
Article Google Scholar
Vinh, N. X., & Bailey, J. (2013). Comments on supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recognition, 46(4), 1220–1225.
Article MATH Google Scholar
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83.
Article Google Scholar
Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P., & Sugiyama, M. (2014). High-dimensional feature selection by feature-wise kernelized lasso. Neural Computation, 26(1), 185–207.
Article MathSciNet MATH Google Scholar
Yang, F., Mao, K. Z., Lee, G. K. K., & Tang, W. (2015). Emphasizing minority class in LDA for feature subset selection on high-dimensional small-sized problems. IEEE Transactions on Knowledge and Data Engineering, 27(1), 88–101.
Article Google Scholar
Yang, J.-B., & Ong, C.-J. (2012). An effective feature selection method via mutual information estimation. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(6), 1550–1559.
Article Google Scholar
Zhang, Y., Li, S., Wang, T., & Zhang, Z. (2013). Divergence-based feature selection for separate classes. Neurocomputing, 101, 32–42.
Article Google Scholar
Zhou, L.-T., Cao, Y.-H., Lv, L.-L., Ma, K.-L., Chen, P.-S., Ni, H.-F., et al. (2017). Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: A two-stage study. Scientific Reports, 7, 39832.
Article Google Scholar
Zou, Q., Ni, L., Zhang, T., & Wang, Q. (2015). Deep learning based feature selection for remote sensing scene classification. IEEE Geoscience and Remote Sensing Letters, 12(11), 2321–2325.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Delhi Technological University, New Delhi, 110042, India
Seba Susan
Department of Electrical Engineering, IIT Delhi, New Delhi, 110016, India
Madasu Hanmandlu

Authors

Seba Susan
View author publications
You can also search for this author inPubMed Google Scholar
Madasu Hanmandlu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Seba Susan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Susan, S., Hanmandlu, M. Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain. Multidim Syst Sign Process 30, 1469–1488 (2019). https://doi.org/10.1007/s11045-018-0612-2

Download citation

Received: 01 April 2018
Revised: 11 August 2018
Accepted: 13 August 2018
Published: 20 August 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s11045-018-0612-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?

A Feature Selection Method Using Conditional Correlation Dispersion and Redundancy Analysis

Extending greedy feature selection algorithms to multiple solutions

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now