Abstract
Risk stratification of cancer patients, that is the prediction of the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years, the use of gene expression profiling in combination with the clinical and histological criteria traditionally used in such a prediction has been successfully introduced. Sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology (gene expression signatures) were introduced and tested by many research groups. A well-known such signature is the 70-genes signature, on which we recently tested several machine learning techniques in order to maximize its predictive power. Genetic Programming (GP) was shown to perform significantly better than other techniques including Support Vector Machines, Multilayer Perceptrons, and Random Forests in classifying patients. Genetic Programming has the further advantage, with respect to other methods, of performing an automatic feature selection. Importantly, by using a weighted average between false positives and false negatives in the definition of the fitness, we showed that GP can outperform all the other methods in minimizing false negatives (one of the main goals in clinical applications) without compromising the overall minimization of incorrectly classified instances. The solutions returned by GP are appealing also from a clinical point of view, being simple, easy to understand, and built out of a rather limited subset of the available features.
Keywords
- Support Vector Machine
- Fitness Function
- Genetic Programming
- Machine Learning Method
- Radial Basis Function Network
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-642-37577-4_18
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Nevins, J.R., Potti, A.: Mining gene expression profiles: expression signatures as cancer phenotypes. Natl. Rev. Genet. 8(8), 601–609 (2007)
Lu, Y., Han, J.: Cancer classification using gene expression data. Inf. Syst. 28(4), 243–268 (2003)
Michie, D., Spiegelhalter, D., Taylor, C.: Machine learning, neural and statistical classification. Prentice-Hall, Englewood Cliffs, NJ (1994)
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumour and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)
Hsu, A., Tang, S., Halgamuge, S.: An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 19(16), 2131–2140 (2003)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Hernandez, J.C.H., Duval, B., Hao, J.: A genetic embedded approach for gene selection and classification of microarray data. Lect. Notes Comput. Sci. 4447, 90–101 (2007)
Friedman, N., Linial, M., Nachmann, I., Peer, D.: Using bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 (2000)
Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA (1989)
Liu, J., Cutler, G., Li, W., Pan, Z., Peng, S., Hoey, T., Chen, L., Ling, X.-B.: Multiclass cancer classification and biomarker discovery using ga-based algorithms. Bioinformatics 21, 2691–2697 (2005)
Moore, J., Parker, J., Hahn, L.: Symbolic discriminant analysis for mining gene expression patterns. Lect. Notes Artif. Int. 2167, 372–381 (2001)
Rosskopf, M., Schmidt, H., Feldkamp, U., Banzhaf, W.: Genetic programming based dna microarray analysis for classification of tumour tissues. Technical Report 2007-2003, Memorial University of Newfoundland (2007)
Yu, J., Yu, J., Almal, A.A., Dhanasekaran, S.M., Ghosh, D., Worzel, W.P., Chinnaiyan, A.M.: Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9(4), 292–303 (2007)
Bojarczuk, C., Lopesb, H., Freitasc, A.: Data mining with constrained-syntax genetic programming: applications to medical data sets. Proc. Intell. Data Anal. Med. Pharmacol. (2001)
Hong, J., Cho, S.: The classification of cancer based on dna microarray data that uses diverse ensemble genetic programming. Artif. Intell. Med. 36, 43–58 (2006)
Vanneschi, L., Farinaccio, A., Giacobini, M., Antoniotti, M., Mauri, G., Provero, P.: Identification of individualized feature combinations for survival prediction in breast cancer: a comparison of machine learning techniques. In: Giacobini, M., et al. (eds.) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Proceedings of the Nineth European Conference, EvoBIO 2010. Lecture Notes in Computer Science, LNCS 6023, pp. 110–121. Springer, Berlin (2010)
van ’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A.M., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Koza, J.R.: Genetic Programming. MIT, Cambridge, MA (1992)
van de Vijver, M.J., He, Y.D., van’t Veer, L.J., Dai, H., Hart, A.A.M., Voskuil, D.W., Schreiber, G.J., Peterse, J.L., Roberts, C., Marton, M.J., Parrish, M., Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T., Bartelink, H., Rodenhuis, S., Rutgers, E.T., Friend, S.H., Bernards, R.: A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347(25), 1999–2009 (2002)
Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk (2008) (With contributions by J.R. Koza)
Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for human oral bioavailability of drugs. In: Cattolico, M., et al. (eds.) Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 255–262. Seattle, Washington, DC (2006)
Archetti, F., Messina, E., Lanzeni, S., Vanneschi, L.: Genetic programming and other machine learning approaches to predict median oral lethal dose (LD50) and plasma protein binding levels (%PPB) of drugs. In: Marchiori, E., et al. (eds.) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Proceedings of the Fifth European Conference, EvoBIO 2007. Lecture Notes in Computer Science, LNCS 4447, pp. 11–23. Springer, Berlin (2007)
Archetti, F., Messina, E., Lanzeni, S., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genet. Program. Evol. M. 8(4), 17–26 (2007)
Silva, S.: GPLAB: a genetic programming toolbox for MATLAB, version 3.0. http://gplab.sourceforge.net (2007)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods – Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1998)
Weka: A multi-task machine learning software developed by Waikato University. www.cs.waikato.ac.nz/ml/weka (2006)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, London (1999)
Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. In: The Eleventh Annual Conference on Computational Learning Theory, Machine Learning, 37(3), 277–296 (1999)
Helmbold, D.P., Warmuth, M.K.: On weak learning. J. Comput. Syst. Sci. 50(3), 551–573 (1995)
Park, J., Sandberg, J.W.: Universal approximation using radial basis functions network. Neural Comput. 3, 246–257 (1991)
Poggio, T., Girosi, F.: Networks for approximation and learning. P. IEEE 78(9), 1481–1497 (1990)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, London (1999)
Acknowledgments
This work was partially supported by Neuroscience Program of the Compagnia di San Paolo in Torino.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Giacobini, M., Provero, P., Vanneschi, L., Mauri, G. (2014). Towards the Use of Genetic Programming for the Prediction of Survival in Cancer. In: Cagnoni, S., Mirolli, M., Villani, M. (eds) Evolution, Complexity and Artificial Life. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37577-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-37577-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37576-7
Online ISBN: 978-3-642-37577-4
eBook Packages: Computer ScienceComputer Science (R0)