Towards the Use of Genetic Programming for the Prediction of Survival in Cancer

Giacobini, Marco; Provero, Paolo; Vanneschi, Leonardo; Mauri, Giancarlo

doi:10.1007/978-3-642-37577-4_12

Towards the Use of Genetic Programming for the Prediction of Survival in Cancer

Marco Giacobini⁴,
Paolo Provero⁵,
Leonardo Vanneschi^6,7 &
…
Giancarlo Mauri⁷

Chapter

1403 Accesses
1 Citations

Abstract

Risk stratification of cancer patients, that is the prediction of the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years, the use of gene expression profiling in combination with the clinical and histological criteria traditionally used in such a prediction has been successfully introduced. Sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology (gene expression signatures) were introduced and tested by many research groups. A well-known such signature is the 70-genes signature, on which we recently tested several machine learning techniques in order to maximize its predictive power. Genetic Programming (GP) was shown to perform significantly better than other techniques including Support Vector Machines, Multilayer Perceptrons, and Random Forests in classifying patients. Genetic Programming has the further advantage, with respect to other methods, of performing an automatic feature selection. Importantly, by using a weighted average between false positives and false negatives in the definition of the fitness, we showed that GP can outperform all the other methods in minimizing false negatives (one of the main goals in clinical applications) without compromising the overall minimization of incorrectly classified instances. The solutions returned by GP are appealing also from a clinical point of view, being simple, easy to understand, and built out of a rather limited subset of the available features.

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-642-37577-4_18

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Nevins, J.R., Potti, A.: Mining gene expression profiles: expression signatures as cancer phenotypes. Natl. Rev. Genet. 8(8), 601–609 (2007)
Article Google Scholar
Lu, Y., Han, J.: Cancer classification using gene expression data. Inf. Syst. 28(4), 243–268 (2003)
Article MATH MathSciNet Google Scholar
Michie, D., Spiegelhalter, D., Taylor, C.: Machine learning, neural and statistical classification. Prentice-Hall, Englewood Cliffs, NJ (1994)
MATH Google Scholar
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumour and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)
Article Google Scholar
Hsu, A., Tang, S., Halgamuge, S.: An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 19(16), 2131–2140 (2003)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Article MATH Google Scholar
Hernandez, J.C.H., Duval, B., Hao, J.: A genetic embedded approach for gene selection and classification of microarray data. Lect. Notes Comput. Sci. 4447, 90–101 (2007)
Article Google Scholar
Friedman, N., Linial, M., Nachmann, I., Peer, D.: Using bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 (2000)
Article Google Scholar
Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975)
Google Scholar
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA (1989)
MATH Google Scholar
Liu, J., Cutler, G., Li, W., Pan, Z., Peng, S., Hoey, T., Chen, L., Ling, X.-B.: Multiclass cancer classification and biomarker discovery using ga-based algorithms. Bioinformatics 21, 2691–2697 (2005)
Article Google Scholar
Moore, J., Parker, J., Hahn, L.: Symbolic discriminant analysis for mining gene expression patterns. Lect. Notes Artif. Int. 2167, 372–381 (2001)
Google Scholar
Rosskopf, M., Schmidt, H., Feldkamp, U., Banzhaf, W.: Genetic programming based dna microarray analysis for classification of tumour tissues. Technical Report 2007-2003, Memorial University of Newfoundland (2007)
Google Scholar
Yu, J., Yu, J., Almal, A.A., Dhanasekaran, S.M., Ghosh, D., Worzel, W.P., Chinnaiyan, A.M.: Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9(4), 292–303 (2007)
Article Google Scholar
Bojarczuk, C., Lopesb, H., Freitasc, A.: Data mining with constrained-syntax genetic programming: applications to medical data sets. Proc. Intell. Data Anal. Med. Pharmacol. (2001)
Google Scholar
Hong, J., Cho, S.: The classification of cancer based on dna microarray data that uses diverse ensemble genetic programming. Artif. Intell. Med. 36, 43–58 (2006)
Article Google Scholar
Vanneschi, L., Farinaccio, A., Giacobini, M., Antoniotti, M., Mauri, G., Provero, P.: Identification of individualized feature combinations for survival prediction in breast cancer: a comparison of machine learning techniques. In: Giacobini, M., et al. (eds.) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Proceedings of the Nineth European Conference, EvoBIO 2010. Lecture Notes in Computer Science, LNCS 6023, pp. 110–121. Springer, Berlin (2010)
Google Scholar
van ’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A.M., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Google Scholar
Koza, J.R.: Genetic Programming. MIT, Cambridge, MA (1992)
MATH Google Scholar
van de Vijver, M.J., He, Y.D., van’t Veer, L.J., Dai, H., Hart, A.A.M., Voskuil, D.W., Schreiber, G.J., Peterse, J.L., Roberts, C., Marton, M.J., Parrish, M., Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T., Bartelink, H., Rodenhuis, S., Rutgers, E.T., Friend, S.H., Bernards, R.: A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347(25), 1999–2009 (2002)
Article Google Scholar
Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk (2008) (With contributions by J.R. Koza)
Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for human oral bioavailability of drugs. In: Cattolico, M., et al. (eds.) Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 255–262. Seattle, Washington, DC (2006)
Google Scholar
Archetti, F., Messina, E., Lanzeni, S., Vanneschi, L.: Genetic programming and other machine learning approaches to predict median oral lethal dose (LD50) and plasma protein binding levels (%PPB) of drugs. In: Marchiori, E., et al. (eds.) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Proceedings of the Fifth European Conference, EvoBIO 2007. Lecture Notes in Computer Science, LNCS 4447, pp. 11–23. Springer, Berlin (2007)
Google Scholar
Archetti, F., Messina, E., Lanzeni, S., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genet. Program. Evol. M. 8(4), 17–26 (2007)
Google Scholar
Silva, S.: GPLAB: a genetic programming toolbox for MATLAB, version 3.0. http://gplab.sourceforge.net (2007)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods – Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1998)
Google Scholar
Weka: A multi-task machine learning software developed by Waikato University. www.cs.waikato.ac.nz/ml/weka (2006)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, London (1999)
MATH Google Scholar
Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. In: The Eleventh Annual Conference on Computational Learning Theory, Machine Learning, 37(3), 277–296 (1999)
MATH Google Scholar
Helmbold, D.P., Warmuth, M.K.: On weak learning. J. Comput. Syst. Sci. 50(3), 551–573 (1995)
Article MATH MathSciNet Google Scholar
Park, J., Sandberg, J.W.: Universal approximation using radial basis functions network. Neural Comput. 3, 246–257 (1991)
Article Google Scholar
Poggio, T., Girosi, F.: Networks for approximation and learning. P. IEEE 78(9), 1481–1497 (1990)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, London (1999)
MATH Google Scholar

Download references

Acknowledgments

This work was partially supported by Neuroscience Program of the Compagnia di San Paolo in Torino.

Author information

Authors and Affiliations

Computational Epidemiology Group, Department of Veterinary Sciences, and Complex Systems Unit, Molecular Biotechnology Center, University of Torino, Torino, Italy
Marco Giacobini
Molecular Biotechnology Center, University of Torino, Torino, Italy
Paolo Provero
ISEGI, Universidade Nova de Lisboa, 1070-312, Lisboa, Portugal
Leonardo Vanneschi
DISCo, University of Milano-Bicocca, 20126, Milan, Italy
Leonardo Vanneschi & Giancarlo Mauri

Authors

Marco Giacobini
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Provero
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Vanneschi
View author publications
You can also search for this author in PubMed Google Scholar
Giancarlo Mauri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Giacobini .

Editor information

Editors and Affiliations

Dept. of Information Engineering, University of Parma, Parma, Italy
Stefano Cagnoni
Consiglio Nazionale delle Ricerche, Istituto di Scienze e Tecnologie della Cognizione, Rome, Italy
Marco Mirolli
Facoltà di Scienze della Comunicazione, University of Modena and Reggio Emilia, Reggio Emilia, Italy
Marco Villani

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Giacobini, M., Provero, P., Vanneschi, L., Mauri, G. (2014). Towards the Use of Genetic Programming for the Prediction of Survival in Cancer. In: Cagnoni, S., Mirolli, M., Villani, M. (eds) Evolution, Complexity and Artificial Life. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37577-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-37577-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37576-7
Online ISBN: 978-3-642-37577-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics