Skip to main content
Log in

Feature selection, mutual information, and the classification of high-dimensional patterns

Applications to image classification and microarray data analysis

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

We propose a novel feature selection filter for supervised learning, which relies on the efficient estimation of the mutual information between a high-dimensional set of features and the classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon entropy. Thus, the complexity does not depend on the number of dimensions but on the number of patterns/samples, and the curse of dimensionality is circumvented. We show that it is then possible to outperform algorithms which individually rank features, as well as a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification. For most of the tested data sets, we obtain better classification results than those reported in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. LOOCV measure is used when the number of samples is so small that a test set cannot be built. It consists of building all possible classifiers, each time leaving out only one sample for test. Note that in this work, it is used just for evaluating the results, but not as a selection criterion.

  2. Datasets can be downloaded from the Broad Institute http://www.broad.mit.edu/, Stanford Genomic Resources http://genome-www.stanford.edu/, and Princeton University http://microarray.princeton.edu/.

References

  1. Sima C, Dougherty ER (2006) What should be expected from feature selection in small-sample settings. Bioinformatics 22(19):2430–2436

    Article  Google Scholar 

  2. Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Proceedings of the 18th international conference on machine learning 601–608

  3. Gentile C (2003) Fast feature selection from microarray expression data via multiplicative large margin algorithms. In: Thrun S, Saul L, Schölkopf B (eds) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge

  4. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Google Scholar 

  5. Abe N, Kude M, Toyama J, Shimbo M (2006) Classifier-independent feature selection on the basis of divergence criterion. Pattern Anal Appl 9(2):127–137

    Article  MathSciNet  Google Scholar 

  6. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271

    Article  MATH  MathSciNet  Google Scholar 

  7. Perkins S, Theiler J (2003) Online feature selection using grafting. In: Proceedings of the 20th international conference on machine learning (ICML-2003), Washington

  8. Harol A, Lai C, Pekalska E, Duin RPW (2007) Pairwise feature evaluation for constructing reduced representations. Pattern Anal Appl 10(1):55–68

    Article  MathSciNet  Google Scholar 

  9. Cover T, Thomas J (1991) Elements of information theory. Wiley, New York

  10. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  11. Hero AO, Michel O (2002) Applications of entropic spanning graphs. IEEE Signal Process Mag 19(5):85–95

    Article  Google Scholar 

  12. Zyczkowski K (2003) Renyi extrapolation of Shannon entropy. Open Syst Inf Dyn 10(3):298–310

    Article  MathSciNet  Google Scholar 

  13. Mokkadem A (1989) Estimation of the entropy and information of absolutely continuous random variables. IEEE Trans Inf Theory 35(1):193–196

    Article  MATH  MathSciNet  Google Scholar 

  14. Torkkola K Feature (2003) Extraction by non-parametric mutual information maximization. J Mach Learn Res 3:1415–1438

  15. Neemuchwala H, Hero A, Carson P (2006) Image registration methods in high-dimensional space. Int J Imaging Syst Technol 16(5):130–145

    Article  Google Scholar 

  16. Paninski I (2003) Estimation of entropy and mutual information. Neural Comput 15(1):

  17. Wolpert D, Wolf D (1995) Estimating function of probability distribution from a finite set of samples. Los Alamos National Laboratory Report LA-UR-92-4369, Santa Fe Institute Report TR-93-07-046

  18. Wachowiak P, Smolíková R, Tourassi D, Elmaghraby S (2005) Estimation of generalized entropies with sample spacing. Pattern Anal Appl 8(1–2):95–101

    Google Scholar 

  19. Beirlant E, Dudewicz E, Gyorfi L, Van der Meulen E (1996) Nonparametric entropy estimation. Int J Math Stat Sci 5(1):17–39

    MathSciNet  Google Scholar 

  20. Oubel E, Neemuchwala H, Hero A, Boisrobert L, Laclaustra M, Frangi AF (2005) Assessment of artery dilation by using image registration based on spatial features. In: Proceedings of SPIE medical imaging, April 2005, vol 5747, pp 1283–1291

  21. Karger DR, Klein PN, Tarjan RE (1995) A randomized linear-time algorithm to find minimum spanning trees. J ACM 42(2): 321–328

    Google Scholar 

  22. Katriel I, Sanders P, Träff J (2003) A practical minimum spanning tree algorithm using the cycle property. 11th European Symposium on Algorithms(ESA), LNCS No. 2832, 679–690

  23. Hero AO, Michel O (1999) Asymptotic theory of greedy aproximations to minnimal k-point random graphs. IEEE Trans Inf Theory 45(6):1921–1939

    Article  MATH  MathSciNet  Google Scholar 

  24. Bertsimas DJ, Van Ryzin G (1990) An asymptotic determination of the minimum spanning tree and minimum matching constants in geometrical probability. Oper Res Lett 9(1):223–231

    Article  MATH  MathSciNet  Google Scholar 

  25. Peñalver A, Escolano F, Sáez JM (2006) EBEM an entropy-based EM algorithm for Gaussian mixture models. ICPR 451–455

  26. Tarr MJ, Bülthoff HH (1999) Object recognition in man, monkey, and machine. Cognition Special Issues, MIT Press, Massachusetts

  27. Dill M, Wolf R, Heisenberg M (1993) Visual pattern recognition in Drosophila involves retinotopic matching. Nature 365(6448):639–644

    Article  Google Scholar 

  28. Meese TS, Hess RF (2004) Low spatial frequencies are suppressively masked across spatial scale, orientation, field position, and eye of origin. J Vis 4(10):843–859

    Google Scholar 

  29. Carmichael O, Mahamud S, Hebert M (2002) Discriminant filters for object recognition. Technical report, Robotics Institute, Carnegie Mellon University, March, CMU-RI-TR-02-09

  30. Ekvall S, Kragic D, Hoffmann F (2005) Object recognition and pose estimation using color cooccurrence histograms and geometric modeling. Image Vis Comput 23:943–955

    Article  Google Scholar 

  31. Chang P, Krumm J (1999) Object recognition with color cooccurrence histograms. In: IEEE conference computer vision pattern recognition, Fort Collins, June 23–25

  32. Stolovitzky G (2003) Gene selection in microarray data: the elephant, the blind men and our algorithms. Curr Opin Struct Biol 13(3):370–376

    Google Scholar 

  33. Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6:148

    Article  Google Scholar 

  34. Pavlidis P, Poirazi P (2006) Individualized markers optimize class prediction of microarray data. BMC Bioinformatics 7:345

    Article  Google Scholar 

  35. Díaz-Uriate R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1):3. doi:10.1186/1471-2105-7-3

    Article  Google Scholar 

  36. Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit 39(12):2383–2392

    Article  Google Scholar 

  37. Singh D, Febbo PG et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209. doi:10.1016/s1535-6108(02)00030-2

    Article  Google Scholar 

Download references

Acknowledgments

This research is funded by the project DPI2005-01280 from the Spanish Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boyan Bonev.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonev, B., Escolano, F. & Cazorla, M. Feature selection, mutual information, and the classification of high-dimensional patterns. Pattern Anal Applic 11, 309–319 (2008). https://doi.org/10.1007/s10044-008-0107-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-008-0107-0

Keywords

Navigation