A genetic algorithm-based method for feature subset selection

Tan, Feng; Fu, Xuezheng; Zhang, Yanqing; Bourgeois, Anu G.

doi:10.1007/s00500-007-0193-8

A genetic algorithm-based method for feature subset selection

Focus
Published: 31 May 2007

Volume 12, pages 111–120, (2008)
Cite this article

Soft Computing Aims and scope Submit manuscript

Feng Tan¹,
Xuezheng Fu¹,
Yanqing Zhang¹ &
…
Anu G. Bourgeois¹

1275 Accesses
157 Citations
Explore all metrics

Abstract

As a commonly used technique in data preprocessing, feature selection selects a subset of informative attributes or variables to build models describing data. By removing redundant and irrelevant or noise features, feature selection can improve the predictive accuracy and the comprehensibility of the predictors or classifiers. Many feature selection algorithms with different selection criteria has been introduced by researchers. However, it is discovered that no single criterion is best for all applications. In this paper, we propose a framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. We conducted experiments using three data sets and three existing feature selection methods. The experimental results demonstrate that our approach is a robust and effective approach to find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alon U et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750
Article Google Scholar
Breiman L, Forest R Technical Report. Stat. Dept, UCB
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data mining Knowl Dis 2(2):121–167
Article Google Scholar
Chuang H-Y et al (2004) Identifying significant genes from microarray data. Fourth IEEE symposium on bioinformatics and bioengineering (BIBE’04) p. 358
Dash M, Liu H (1999) Handling large unsupervised data via dimensionality reduction. ACM SIGMOD workshop on research issues in data mining and knowledge discovery
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889
Google Scholar
Furey T, Cristianini N, Bednarski DN, Schummer DM (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914
Article Google Scholar
Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection (Kernel Machines Section). JMLR 3:1157–1182
Article MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Article MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York
Google Scholar
Hsu FD, Shapiro J, Taksa I (2002) Methods of data fusion in information retreival: Rank vs. Score combination. DIMACS Technical report 58
Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinform 6:148
Google Scholar
LeCun Y, Denker JS, Solla SA (1990) Optimum brain damage. Touretzky DS (ed) Advances in neural information processing systems II, Morgan Kaufmann, Mateo
Liu Y (2004) A comparative study on feature selection methods for drug discovery. J Chem Inform Comput Sci 44(5):1823–1828
Article Google Scholar
Liu H, Setiono R (1995) χ²: feature selection and discretization of numeric attributes. In: Proceedings IEEE 7th international conference on tools with artificial intelligence, pp 338–391
Liu H, Li J, Wong L (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic pattern. Genom Inform 13:51–60
Google Scholar
Liu H et al. (2005) Evolving feature selection. Intelligent systems. IEEE Vol 20(6): 64–76
Google Scholar
Liu X, Krishnan A, Mondry A (2005) An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinform 6:76
Google Scholar
Mao Y, Zhou X, Pi D, Sun Y, STC Wong (2005) Multiclass cancer classification by using fuzzy support vector machine and binary decision tree with gene selection. J Biomed Biotechnol 2:160–171
Article Google Scholar
Noble WS (2004) Support vector machine applications in computational biology. In: Schoelkopf B, suda KT, Vert J.-P (eds). Kernel methods in computational biology. MIT, New York, pp 71–92
Google Scholar
Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Patt Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Schölkopf B, Guyon I, Weston J (2003) Statistical learning and kernel methods in bioinformatics. In: Frasconi P, Shamir R (eds) Artificial intelligence and heuristic methods in bioinformatics. vol 183. IOS Press, Amsterdam, pp 1–21
Singh D (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209
Article Google Scholar
Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209
Article Google Scholar
Somorjai RL, Dolenko B, Baumgartner R (2003) Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics. 12; 19(12):1484–91
Google Scholar
Space Physics Group; Applied Physics Laboratory; Johns Hopkins University; Johns Hopkins Road; Laurel; MD 20723
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst 13:44–49
Article Google Scholar
Yu L, Liu H (2003) Efficiently handling feature redundancy in high-dimensional data. In: Proceedings of ACM SIGKDD international conference knowledge discovery and data mining (KDD 03), ACM, New york, pp. 685–690
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature Selection for SVMs. Adv Neural Inform Process Syst 13

Download references

Author information

Authors and Affiliations

Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
Feng Tan, Xuezheng Fu, Yanqing Zhang & Anu G. Bourgeois

Authors

Feng Tan
View author publications
You can also search for this author in PubMed Google Scholar
Xuezheng Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yanqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Anu G. Bourgeois
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Tan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, F., Fu, X., Zhang, Y. et al. A genetic algorithm-based method for feature subset selection. Soft Comput 12, 111–120 (2008). https://doi.org/10.1007/s00500-007-0193-8

Download citation

Published: 31 May 2007
Issue Date: January 2008
DOI: https://doi.org/10.1007/s00500-007-0193-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A genetic algorithm-based method for feature subset selection

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Genetic algorithms: theory, genetic operators, solutions, and applications

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A genetic algorithm-based method for feature subset selection

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Genetic algorithms: theory, genetic operators, solutions, and applications

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation