Skip to main content
Log in

Feature ranking and best feature subset using mutual information

  • Original Article
  • Published:
Neural Computing & Applications Aims and scope Submit manuscript

Abstract

A new algorithm for ranking the input features and obtaining the best feature subset is developed and illustrated in this paper. The asymptotic formula for mutual information and the expectation maximisation (EM) algorithm are used to developing the feature selection algorithm in this paper. We not only consider the dependence between the features and the class, but also measure the dependence among the features. Even for noisy data, this algorithm still works well. An empirical study is carried out in order to compare the proposed algorithm with the current existing algorithms. The proposed algorithm is illustrated by application to a variety of problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

Similar content being viewed by others

References

  1. Wang WJ, Jones P, Partridge D (1999) Assessing the impact of input features in a feedforward network. Neural Computing and Applications 9:101–112

    MATH  Google Scholar 

  2. van de Laar P, Heskes TM, Gielen CCAM (1999) Partial retraining: a new approach to input relevance determination. Int J Neur Sys 9:75–85

    Article  Google Scholar 

  3. Battiti R (1994) Using mutual information for selecting features in supervised neutral net learning. IEEE Trans Neur Netwks 5:537–550

    Article  Google Scholar 

  4. Kwak N, Choi C-H (2002) Input feature selection for classification problems. IEEE Trans Neur Netwks 13:143–159

    Article  Google Scholar 

  5. Tchaban T, Taylor MJ, Griffin J (1998) Establishing impacts of the inputs in a feedforward network. Neural Computing and Applications 7:309–317

    MATH  Google Scholar 

  6. Young TY, Coraluppi G (1970) Stochastic estimation of a mixture of normal density functions using an information criterion. IEEE Trans Info Theor 16:258–263

    MATH  Google Scholar 

  7. Carreira-Perpinan MA (2000) Mode-finding for mixtures of Gaussian distributions. IEEE Trans Patt Anal Mach Intell 22(11):1318–1323

    Article  Google Scholar 

  8. Cang S, Partridge D (2001) Determining the number of components in mixture models using Williams’ statistical test. In: Proceedings of the 8th International Conference on Neural Information Processing, Shanghai, China, November 2001

  9. Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J Roy Stat Soc B59:731–792

    Article  Google Scholar 

  10. Bishop C (1995) Neural networks for pattern recognition. Oxford University Press, Oxford, UK

  11. Haykin S (1999) Neural networks: a comprehensive foundation. Prentice-Hall, Englewood Cliffs, NJ

    Google Scholar 

  12. Theodoridis S, Koutroumbas K (1999) Pattern recognition. Academic Press, San Diego, CA

Download references

Acknowledgements

We wish to thank Julia Sonander and Harri Howells of National Air-Traffic Services for the STCA data, and the Engineering and Physical Science Research Council of the UK for supporting this work (grant no. GR/M75143).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuang Cang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cang, S., Partridge, D. Feature ranking and best feature subset using mutual information. Neural Comput & Applic 13, 175–184 (2004). https://doi.org/10.1007/s00521-004-0400-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-004-0400-9

Keywords

Navigation