Skip to main content
Log in

Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

A new filter method is proposed for feature selection and ranking that incorporates a novel mutual information with Gaussian gain for evaluating the relationships between features and the class, and in-between features. The new mutual information is derived as per the axioms of classical information theory from the recently introduced non-extensive entropy with Gaussian gain. The characteristic of this entropy is its non-linearity when representing correlated information in natural texture images represented by sparse probability distributions. In this work, we trace this property in our new mutual information in the context of correlated random variables associated with real-world datasets. The non-linearity of the Gaussian function embedded in the mutual information formula is utilized for identifying the most important features in the correct order of rank, right at the outset of the incremental feature selection algorithm. This leads to formation of smaller groups of ranked feature subsets that give the highest classification accuracies. Extensive experimentation on twenty benchmark datasets from the UCI repository along with comparison to the state-of-the-art confirms the efficacy of our approach. An automated optimum feature subset selection is also proposed based on a simple statistical test on the new measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akande, K. O., Owolabi, T. O., & Olatunji, S. O. (2015). Investigating the effect of correlation-based feature selection on the performance of support vector machines in reservoir characterization. Journal of Natural Gas Science and Engineering, 22, 515–522.

    Article  Google Scholar 

  • Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Network, 5(4), 537–550.

    Article  Google Scholar 

  • Ben-Bassat, M. (1982). 35 Use of distance measures, information measures and error bounds in feature evaluation. Handbook of Statistics, 2, 773–791.

    Article  MATH  Google Scholar 

  • Bolón-Canedo, V., Alonso-Betanzos, A., & Sánchez-Maroño, N. (2017). Artificial intelligence: Foundations, theory, and algorithms feature selection for high-dimensional data. New ork: Springer.

    Google Scholar 

  • Bostani, H., & Sheikhan, M. (2017). Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems. Soft Computing, 21(9), 2307–2324.

    Article  Google Scholar 

  • Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3), 462–467.

    Article  MathSciNet  MATH  Google Scholar 

  • Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.

    Book  MATH  Google Scholar 

  • Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.

    MathSciNet  MATH  Google Scholar 

  • Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification (2nd ed.). New York: Wiley.

    MATH  Google Scholar 

  • Eriksson, T., Kim, S., Kang, H.-G., & Lee, C. (2005). An information-theoretic perspective on feature selection in speaker recognition. IEEE Signal Processing Letters, 12(7), 500–503.

    Article  Google Scholar 

  • Estevez, P. A., Tesmer, M., Perez, C. A., & Zurada, J. M. (2009). Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 20(2), 189–201.

    Article  Google Scholar 

  • Frank, A., & Asuncion, A. (2010). UCI machine learning repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.

  • Hoque, N., Bhattacharyya, D. K., & Kalita, J. K. (2014). MIFS-ND: A mutual information-based feature selection method. Expert Systems with Applications, 41(14), 6371–6385.

    Article  Google Scholar 

  • Jurado, S., Nebot, À., Mugica, F., & Avellana, N. (2015). Hybrid methodologies for electricity load forecasting: Entropy-based feature selection with machine learning and soft computing techniques. Energy, 86, 276–291.

    Article  Google Scholar 

  • Khoshgoftaar, T. M., Gao, K., Napolitano, A., & Wald, R. (2014). A comparative study of iterative and non-iterative feature selection techniques for software defect prediction. Information Systems Frontiers, 16(5), 801–822.

    Article  Google Scholar 

  • Khushaba, R. N., Al-Ani, A., & Al-Jumaily, A. (2011). Feature subset selection using differential evolution and a statistical repair mechanism. Expert Systems with Applications, 38(9), 11515–11526.

    Article  Google Scholar 

  • Kira, K., & Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new algorithm. In AAAI (Vol. 2, pp. 129–134).

  • Lee, S., Park, Y.-T., & d’Auriol, B. J. (2012). A novel feature selection method based on normalized mutual information. Applied Intelligence, 37(1), 100–120.

    Article  Google Scholar 

  • Lisker, L. (1957). Minimal cues for separating/w, r, l, y/in intervocalic position. Word, 13(2), 256–267.

    Article  Google Scholar 

  • Liu, H., & Lei, Yu. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.

    Article  Google Scholar 

  • Mafarja, M., Aljarah, I., Heidari, A. A., Hammouri, A. I., Faris, H., Ala’M, A. Z., et al. (2018). Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems. Knowledge-Based Systems, 145, 25–45.

    Article  Google Scholar 

  • Momma, M., & Bennett, K. P. (2002). A pattern search method for model selection of support vector regression. In SDM (Vol. 132, pp. 261–274).

  • Pascoal, C., Oliveira, M. R., Pacheco, A., & Valadas, R. (2017). Theoretical evaluation of feature selection methods based on mutual information. Neurocomputing, 226, 168–181.

    Article  Google Scholar 

  • Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual Information: Criteria of max-dependency, max-relevance and min redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.

    Article  Google Scholar 

  • Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

    Google Scholar 

  • Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423.

    Article  MathSciNet  MATH  Google Scholar 

  • Song, Q., Ni, J., & Wang, G. (2013). A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering, 25(1), 1–14.

    Article  Google Scholar 

  • Susan, S., & Dwivedi, M. (2014). Dynamic growth of hidden-layer neurons using the non-extensive entropy. In 2014 Fourth international conference on communication systems and network technologies (CSNT) (pp. 491–495). IEEE, 2014.

  • Susan, S., & Hanmandlu, M. (2013a). A non-extensive entropy feature and its application to texture classification. Neurocomputing, 120, 214–225.

    Article  Google Scholar 

  • Susan, S., & Hanmandlu, M. (2013b). A novel fuzzy entropy based on the non-extensive entropy and its application for feature selection. In 2013 IEEE international conference on fuzzy systems (FUZZ) (pp. 1–8). IEEE.

  • Susan, S., & Hanmandlu, M. (2015). Unsupervised detection of nonlinearity in motion using weighted average of non-extensive entropies. Signal, Image and Video Processing, 9(3), 511–525.

    Article  Google Scholar 

  • Susan, S., & Kumar, A. (2016). Auto-segmentation using mean-shift and entropy analysis. In 2016 3rd international conference on computing for sustainable global development (INDIACom) (pp. 292–296). IEEE, 2016.

  • Susan, S., & Sharma, M. (2017). Automatic texture defect detection using Gaussian mixture entropy modeling. Neurocomputing, 239, 232–237.

    Article  Google Scholar 

  • Tang, B., He, H., Baggenstoss, P. M., & Kay, S. (2016). A Bayesian classification approach using class-specific features for text categorization. IEEE Transactions on Knowledge and Data Engineering, 28(6), 1602–1606.

    Article  Google Scholar 

  • Tang, J., & Liu, H. (2014). An unsupervised feature selection framework for social media data. IEEE Transactions on Knowledge and Data Engineering, 26(12), 2914–2927.

    Article  Google Scholar 

  • Vinh, N. X., & Bailey, J. (2013). Comments on supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recognition, 46(4), 1220–1225.

    Article  MATH  Google Scholar 

  • Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83.

    Article  Google Scholar 

  • Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P., & Sugiyama, M. (2014). High-dimensional feature selection by feature-wise kernelized lasso. Neural Computation, 26(1), 185–207.

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, F., Mao, K. Z., Lee, G. K. K., & Tang, W. (2015). Emphasizing minority class in LDA for feature subset selection on high-dimensional small-sized problems. IEEE Transactions on Knowledge and Data Engineering, 27(1), 88–101.

    Article  Google Scholar 

  • Yang, J.-B., & Ong, C.-J. (2012). An effective feature selection method via mutual information estimation. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(6), 1550–1559.

    Article  Google Scholar 

  • Zhang, Y., Li, S., Wang, T., & Zhang, Z. (2013). Divergence-based feature selection for separate classes. Neurocomputing, 101, 32–42.

    Article  Google Scholar 

  • Zhou, L.-T., Cao, Y.-H., Lv, L.-L., Ma, K.-L., Chen, P.-S., Ni, H.-F., et al. (2017). Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: A two-stage study. Scientific Reports, 7, 39832.

    Article  Google Scholar 

  • Zou, Q., Ni, L., Zhang, T., & Wang, Q. (2015). Deep learning based feature selection for remote sensing scene classification. IEEE Geoscience and Remote Sensing Letters, 12(11), 2321–2325.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seba Susan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Susan, S., Hanmandlu, M. Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain. Multidim Syst Sign Process 30, 1469–1488 (2019). https://doi.org/10.1007/s11045-018-0612-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-018-0612-2

Keywords

Navigation