Simulated annealing for supervised gene selection

Filippone, Maurizio; Masulli, Francesco; Rovetta, Stefano

doi:10.1007/s00500-010-0597-8

Simulated annealing for supervised gene selection

Focus
Published: 31 March 2010

Volume 15, pages 1471–1482, (2011)
Cite this article

Soft Computing Aims and scope Submit manuscript

Maurizio Filippone¹,
Francesco Masulli^2,3,4 &
Stefano Rovetta^2,3

306 Accesses
9 Citations
Explore all metrics

Abstract

Genomic data, and more generally biomedical data, are often characterized by high dimensionality. An input selection procedure can attain the two objectives of highlighting the relevant variables (genes) and possibly improving classification results. In this paper, we propose a wrapper approach to gene selection in classification of gene expression data using simulated annealing along with supervised classification. The proposed approach can perform global combinatorial searches through the space of all possible input subsets, can handle cases with numerical, categorical or mixed inputs, and is able to find (sub-)optimal subsets of inputs giving low classification errors. The method has been tested on publicly available bioinformatics data sets using support vector machines and on a mixed type data set using classification trees. We also propose some heuristics able to speed up the convergence. The experimental results highlight the ability of the method to select minimal sets of relevant features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data

Article 12 September 2022

Minimizing Redundancy Among Genes Selected Based on the Overlapping Analysis

Modified Binary Inertial Particle Swarm Optimization for Gene Selection in DNA Microarray Data

Notes

References

Agrafiotis DK, Cedeo W (2002) Feature selection for structure-activity correlation using binary particle swarms. J Med Chem 45:1098–1107
Article Google Scholar
Albrecht AA, Vinterbo SA, Ohno-Machado L (2003) An epicurean learning approach to gene-expression data classification. Artif Intell Med 28(1):75–87
Article Google Scholar
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Google Scholar
Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99:6562–6566
Article MATH Google Scholar
Andonie R, Fabry-Asztalos L, Abdul-Wahid, Collar C, S, Salim N (2006) An integrated soft computing approach for predicting biological activity of potential HIV-1 protease inhibitors. In: Proceedings of the IEEE international conference on neural networks, pp 7495–7502
Bangalore AS, Shaffer RE, Small GW, Arnold MA (1996) Genetic Algorithm-based method for selecting wavelength and model size for use with partial least-squares regression: application to near-infrared spectroscopy. Anal Chem 68:4200–4212
Article Google Scholar
Barkai E (2003) Aging in subdiffusion generated by a deterministic dynamical system. Phys Rev Lett 90:104101
Article Google Scholar
Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271
Article MATH MathSciNet Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth & Brooks, Pacific Grove
MATH Google Scholar
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
MATH Google Scholar
Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York
MATH Google Scholar
Filippone M, Masulli F, Rovetta S (2005) Unsupervised gene selection and clustering using simulated annealing. In: Bloch I, Petrosino A, Tettamanzi A (eds) WILF, Lecture notes in computer science, vol 3849. Springer, New York, pp 229–235
Ganesan D, Greenstein B, Perelyubskiy D, Estrin D, Heidemann J (2003) An evaluation of multi-resolution storage for sensor networks. In: Proceedings of the first ACM conference on embedded networked sensor systems (SenSys 2003). ACM, pp 89–102
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Article MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Article MATH Google Scholar
Izrailev S, Agrafiotis DK (2002) Variable selection for QSAR by artificial ant colony systems. SAR QSAR Environ Res 13:417–423
Article Google Scholar
Jouan-Rimbaud D, Massart D-L, Leardi R, de Noord OE (1995) Genetic algorithms as a tool for wavelength selection in multivariate calibration. Anal Chem 67:4295–4301
Article Google Scholar
Debuse JCW, Rayward-Smith VJ (1997) Feature subset selection within a simulated annealing data mining algorithm. J Intell Inf Syst 9:57–81
Article Google Scholar
Kalivas JH, Roberts N, Sutter JM (1989) Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry. Anal Chem 61:2024–2030
Article Google Scholar
Kira K, Rendell L (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of 10th national conference on artificial intelligence (AAAI-92), pp 129–134
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220:661–680
Article MathSciNet Google Scholar
Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Article MATH Google Scholar
Koller D, Sahami M (1996) Toward optimal feature selection. In: Saitta L (ed) Proceedings of the thirteenth international conference (ICML ’96). Morgan Kaufmann, pp 284–292
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of seventh European conference machine learning, pp 171–182
Kubinyi H (1994) Variable selection in QSAR studies. I. An evolutionary algorithm. Quant Struct-Act Relat 13:285–294
Google Scholar
Leardi R, González AL (1998) Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chemom Intell Lab Syst 41:195–207
Article Google Scholar
Masulli F, Rovetta S (2003) Random Voronoi ensembles for gene selection. Neurocomputing 55(3–4):721–726
Article Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations for fast computing machines. J Chem Phys 21:1087–1092
Article Google Scholar
Michalewicz Z (1998) Genetic algorithms + data structures = evolution programs, 3rd edn. Springer-Verlag, Berlin
Moneta C, Parodi GC, Rovetta S, Zunino R (1992) Automated diagnosis and disease characterization using neural network analysis. In: Proceedings of the 1992 IEEE international conference on systems, man and cybernetics, Chicago, IL, USA, pp 123–128
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1992) Numerical recipes in C, 2nd edn. Cambridge University Press, Cambridge
R Development Core Team (2005) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN:3-900051-07-0. http://www.R-project.org
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
MATH Google Scholar
Romeo F, Sangiovanni-Vincentelli A (1985) Probabilistic hill-climbing algorithms: properties and applications. Computer Science Press, Chapell Hill
Google Scholar
Siedlecki W, Sklansky J (1989) A note on genetic algorithms for large-scale feature selection. Pattern Recognit Lett 10:335–347
Article MATH Google Scholar
Slonim N, Tishby N (2000) Agglomerative information bottleneck. In: Advances in neural information processing systems, pp 617–623
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B 36(1):111–147
MATH Google Scholar
Sutter JM, Kalivas JH (1993) Comparison of forward selection, backward elimination, and generalized simulated annealing for variable selection. Microchem J 47:60–66
Article Google Scholar
Sutter JM, Dixon SL, Jurs PC (1995) Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J Chem Inf Comput Sci 35:77–84
Google Scholar
Tanenbaum A (2001) Modern operating systems, 2nd edn. Prentice Hall, Englewood Cliffs
Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag, New York
MATH Google Scholar
Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem 29(1):37–46
Article Google Scholar
Weston J, Elisseff A, Schoelkopf B, Tipping M (2003) Use of the zero norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461
Article MATH Google Scholar

Download references

Acknowledgments

We thank Chih-Chung Chang for help about internals of LIBSVM in R. A discussion with Giorgio Valentini helped us to clarify an important issue of this paper. This work was funded by the the Italian Ministry of Education, University and Research (code 2004062740).

Author information

Authors and Affiliations

Department of Computing Science, University of Glasgow, Sir Alwyn Williams Building, G12 8QQ, Glasgow, UK
Maurizio Filippone
Department of Computer and Information Sciences, University of Genova, Genoa, Italy
Francesco Masulli & Stefano Rovetta
CNISM Genova Research Unit, Genoa, Italy
Francesco Masulli & Stefano Rovetta
Sbarro Institute for Cancer Research and Molecular Medicine, Center for Biotechnology, Temple University, Philadelphia, PA, USA
Francesco Masulli

Authors

Maurizio Filippone
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Masulli
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Rovetta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Masulli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Filippone, M., Masulli, F. & Rovetta, S. Simulated annealing for supervised gene selection. Soft Comput 15, 1471–1482 (2011). https://doi.org/10.1007/s00500-010-0597-8

Download citation

Published: 31 March 2010
Issue Date: August 2011
DOI: https://doi.org/10.1007/s00500-010-0597-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simulated annealing for supervised gene selection

Abstract

Access this article

Similar content being viewed by others

Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data

Minimizing Redundancy Among Genes Selected Based on the Overlapping Analysis

Modified Binary Inertial Particle Swarm Optimization for Gene Selection in DNA Microarray Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simulated annealing for supervised gene selection

Abstract

Access this article

Similar content being viewed by others

Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data

Minimizing Redundancy Among Genes Selected Based on the Overlapping Analysis

Modified Binary Inertial Particle Swarm Optimization for Gene Selection in DNA Microarray Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation