Feature selection, mutual information, and the classification of high-dimensional patterns

Bonev, Boyan; Escolano, Francisco; Cazorla, Miguel

doi:10.1007/s10044-008-0107-0

Feature selection, mutual information, and the classification of high-dimensional patterns

Applications to image classification and microarray data analysis

Theoretical Advances
Published: 22 February 2008

Volume 11, pages 309–319, (2008)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Boyan Bonev¹,
Francisco Escolano¹ &
Miguel Cazorla¹

1141 Accesses
56 Citations
Explore all metrics

Abstract

We propose a novel feature selection filter for supervised learning, which relies on the efficient estimation of the mutual information between a high-dimensional set of features and the classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon entropy. Thus, the complexity does not depend on the number of dimensions but on the number of patterns/samples, and the curse of dimensionality is circumvented. We show that it is then possible to outperform algorithms which individually rank features, as well as a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification. For most of the tested data sets, we obtain better classification results than those reported in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Feature Selection Approach Based on Information Theory for Classification Tasks

The Feature Selection Method Based on a Probabilistic Approach and a Cross-Entropy Metric for the Image Recognition Problem

Article 01 December 2021

Yu. A. Dubnov

Chameleon: A Python Workflow Toolkit for Feature Selection

Notes

LOOCV measure is used when the number of samples is so small that a test set cannot be built. It consists of building all possible classifiers, each time leaving out only one sample for test. Note that in this work, it is used just for evaluating the results, but not as a selection criterion.
Datasets can be downloaded from the Broad Institute http://www.broad.mit.edu/, Stanford Genomic Resources http://genome-www.stanford.edu/, and Princeton University http://microarray.princeton.edu/.

References

Sima C, Dougherty ER (2006) What should be expected from feature selection in small-sample settings. Bioinformatics 22(19):2430–2436
Article Google Scholar
Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Proceedings of the 18th international conference on machine learning 601–608
Gentile C (2003) Fast feature selection from microarray expression data via multiplicative large margin algorithms. In: Thrun S, Saul L, Schölkopf B (eds) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Google Scholar
Abe N, Kude M, Toyama J, Shimbo M (2006) Classifier-independent feature selection on the basis of divergence criterion. Pattern Anal Appl 9(2):127–137
Article MathSciNet Google Scholar
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271
Article MATH MathSciNet Google Scholar
Perkins S, Theiler J (2003) Online feature selection using grafting. In: Proceedings of the 20th international conference on machine learning (ICML-2003), Washington
Harol A, Lai C, Pekalska E, Duin RPW (2007) Pairwise feature evaluation for constructing reduced representations. Pattern Anal Appl 10(1):55–68
Article MathSciNet Google Scholar
Cover T, Thomas J (1991) Elements of information theory. Wiley, New York
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Hero AO, Michel O (2002) Applications of entropic spanning graphs. IEEE Signal Process Mag 19(5):85–95
Article Google Scholar
Zyczkowski K (2003) Renyi extrapolation of Shannon entropy. Open Syst Inf Dyn 10(3):298–310
Article MathSciNet Google Scholar
Mokkadem A (1989) Estimation of the entropy and information of absolutely continuous random variables. IEEE Trans Inf Theory 35(1):193–196
Article MATH MathSciNet Google Scholar
Torkkola K Feature (2003) Extraction by non-parametric mutual information maximization. J Mach Learn Res 3:1415–1438
Neemuchwala H, Hero A, Carson P (2006) Image registration methods in high-dimensional space. Int J Imaging Syst Technol 16(5):130–145
Article Google Scholar
Paninski I (2003) Estimation of entropy and mutual information. Neural Comput 15(1):
Wolpert D, Wolf D (1995) Estimating function of probability distribution from a finite set of samples. Los Alamos National Laboratory Report LA-UR-92-4369, Santa Fe Institute Report TR-93-07-046
Wachowiak P, Smolíková R, Tourassi D, Elmaghraby S (2005) Estimation of generalized entropies with sample spacing. Pattern Anal Appl 8(1–2):95–101
Google Scholar
Beirlant E, Dudewicz E, Gyorfi L, Van der Meulen E (1996) Nonparametric entropy estimation. Int J Math Stat Sci 5(1):17–39
MathSciNet Google Scholar
Oubel E, Neemuchwala H, Hero A, Boisrobert L, Laclaustra M, Frangi AF (2005) Assessment of artery dilation by using image registration based on spatial features. In: Proceedings of SPIE medical imaging, April 2005, vol 5747, pp 1283–1291
Karger DR, Klein PN, Tarjan RE (1995) A randomized linear-time algorithm to find minimum spanning trees. J ACM 42(2): 321–328
Google Scholar
Katriel I, Sanders P, Träff J (2003) A practical minimum spanning tree algorithm using the cycle property. 11th European Symposium on Algorithms(ESA), LNCS No. 2832, 679–690
Hero AO, Michel O (1999) Asymptotic theory of greedy aproximations to minnimal k-point random graphs. IEEE Trans Inf Theory 45(6):1921–1939
Article MATH MathSciNet Google Scholar
Bertsimas DJ, Van Ryzin G (1990) An asymptotic determination of the minimum spanning tree and minimum matching constants in geometrical probability. Oper Res Lett 9(1):223–231
Article MATH MathSciNet Google Scholar
Peñalver A, Escolano F, Sáez JM (2006) EBEM an entropy-based EM algorithm for Gaussian mixture models. ICPR 451–455
Tarr MJ, Bülthoff HH (1999) Object recognition in man, monkey, and machine. Cognition Special Issues, MIT Press, Massachusetts
Dill M, Wolf R, Heisenberg M (1993) Visual pattern recognition in Drosophila involves retinotopic matching. Nature 365(6448):639–644
Article Google Scholar
Meese TS, Hess RF (2004) Low spatial frequencies are suppressively masked across spatial scale, orientation, field position, and eye of origin. J Vis 4(10):843–859
Google Scholar
Carmichael O, Mahamud S, Hebert M (2002) Discriminant filters for object recognition. Technical report, Robotics Institute, Carnegie Mellon University, March, CMU-RI-TR-02-09
Ekvall S, Kragic D, Hoffmann F (2005) Object recognition and pose estimation using color cooccurrence histograms and geometric modeling. Image Vis Comput 23:943–955
Article Google Scholar
Chang P, Krumm J (1999) Object recognition with color cooccurrence histograms. In: IEEE conference computer vision pattern recognition, Fort Collins, June 23–25
Stolovitzky G (2003) Gene selection in microarray data: the elephant, the blind men and our algorithms. Curr Opin Struct Biol 13(3):370–376
Google Scholar
Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6:148
Article Google Scholar
Pavlidis P, Poirazi P (2006) Individualized markers optimize class prediction of microarray data. BMC Bioinformatics 7:345
Article Google Scholar
Díaz-Uriate R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1):3. doi:10.1186/1471-2105-7-3
Article Google Scholar
Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit 39(12):2383–2392
Article Google Scholar
Singh D, Febbo PG et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209. doi:10.1016/s1535-6108(02)00030-2
Article Google Scholar

Download references

Acknowledgments

This research is funded by the project DPI2005-01280 from the Spanish Government.

Author information

Authors and Affiliations

Deptartamento Ciencia Computación e Inteligencia Artificial, Universidad de Alicante, Ap. Correos 99, 03080, Alicante, Spain
Boyan Bonev, Francisco Escolano & Miguel Cazorla

Authors

Boyan Bonev
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Escolano
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Cazorla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boyan Bonev.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonev, B., Escolano, F. & Cazorla, M. Feature selection, mutual information, and the classification of high-dimensional patterns. Pattern Anal Applic 11, 309–319 (2008). https://doi.org/10.1007/s10044-008-0107-0

Download citation

Received: 06 January 2007
Accepted: 20 January 2008
Published: 22 February 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s10044-008-0107-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection, mutual information, and the classification of high-dimensional patterns

Abstract

Access this article

Similar content being viewed by others

A Feature Selection Approach Based on Information Theory for Classification Tasks

The Feature Selection Method Based on a Probabilistic Approach and a Cross-Entropy Metric for the Image Recognition Problem

Chameleon: A Python Workflow Toolkit for Feature Selection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature selection, mutual information, and the classification of high-dimensional patterns

Abstract

Access this article

Similar content being viewed by others

A Feature Selection Approach Based on Information Theory for Classification Tasks

The Feature Selection Method Based on a Probabilistic Approach and a Cross-Entropy Metric for the Image Recognition Problem

Chameleon: A Python Workflow Toolkit for Feature Selection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation