Abstract
The prediction of pharmacokinetic parameters is a crucial phase of the drug discovery process, and the automatization of this task is a hot topic in computational bio-medicine. In the last 10 years, a significant amount of research has been published reporting on applications of genetic programming to the prediction of pharmacokinetic parameters. This paper summarizes and discusses some of those contributions. In particular, the focus is on the idea that pharmacokinetic problems are so complex that the “canonic” version of genetic programming is often not able to perform appropriately on them. At the same time, genetic programming has a high degree of versatility, given by the opportunity it offers of adapting many crucial parts of its algorithm, among which the fitness function and the employed genetic operators. This gives us the chance to improve standard genetic programming in several different ways. For instance, sophisticated fitness functions, methods to control bloat and operators to exploit the geometry of the semantic space are discussed here.
Notes
From now on, only the median (calculated over a set of independent runs, each of which performed using a different training/test set partition) of the results obtained on the test set are discussed. The interested reader is referred to the papers quoted in the text for a detailed discussion of all the experimental settings, including number of runs, used parameter values, etc.
With this expression, here and in the continuation of the paper, it is intended that the differences between the compared methods are not statistically significant according to the Wilcoxon rank-sum test. This method has always been used with Bonferroni correction whenever the number of compared methods was larger than two. Also, this method has always been executed after verifying that data are not normally distributed using the Kolmogorov-Smirnov test, a result that consistently holds for all the results discussed in this paper.
By the term “better”, here and in the continuation of the paper, it is meant that the differences between the compared methods are statistically significant according to the statistical test described in the previous footnote.
References
Tuffs A (2001) Bayer faces shake up after Lipobay withdrawn. Br Med J 323(7317):828
Archetti F, Lanzeni S, Messina E, Vanneschi L (2007) Genetic programming for computational pharmacokinetics in drug discovery and development. Genet Program Evol Mach 8:413–432
Castelli M, Manzoni L, Silva S, Vanneschi L (2011) A quantitative study of learning and generalization in genetic programming. In: Silva S et al (eds) Proceedings of the 14th European conference on genetic programming, EuroGP 2011, volume 6621 of LNCS. Springer, Turin, pp 25–36
Castelli M, Silva S, Vanneschi L (2014) A C++ framework for geometric semantic genetic programming. Genet Program Evol Mach pp 1–9
Collard P, Verel S, Clergue M (2004) Local search heuristics: fitness cloud versus fitness landscape. In: Mántaras RLD, Saitta L (eds) European conference on artificial intelligence (ECAI04). IOS Press, Valence, pp 973–974
Dignum S, Poli R (2008) Operator equalisation and bloat free GP. In: O’Neill M et al (eds) Proceedings of the 11th European conference on genetic programming, EuroGP 2008, volume 4971 of Lecture Notes in Computer Science. Springer, Naples, pp 110–121
Yoshida F, Topliss JG (2000) QSAR model for drug human oral bioavailability. J Med Chem 43:2575–2585
van de Waterbeemd H, Gifford E (2003) ADMET in silico modeling: towards prediction paradise? Nat Rev Drug Discov 2:192–204
Kola I, Landis J (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Dug Discov 3:711–716
Eddershaw JP, Beresford AP, Bayliss MK (2000) ADME/PK as part of a rational approach to drug discovery. Drug Discov Today 9:409–414
Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan C et al (eds) Genetic Programming, Proceedings of the 6th European conference, EuroGP 2003, volume 2610 of LNCS, Essex. Springer, Berlin, pp 71–83
Koza JR (1992) Genetic Programming. The MIT Press, Cambridge
Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic programming. In: Coello Coello CA (ed) Parallel problem solving from nature, volume 7491 of LNCS. Springer, Berlin, pp 21–31
Poli R, Langdon WB, Mcphee NF (2008) A field guide to genetic programming.
David S, Wishart Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucl Acids Re 34. doi:10.1093/nar/gkj067
Silva S, Dignum S (2009) Extending operator equalisation: fitness based self adaptive length distribution for bloat free gp. In: Vanneschi L et al (eds) Genetic programming, volume 5481 of Lecture Notes in Computer Science.Springer, Berlin, pp 159–170
Silva S, Vanneschi L (2009) Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO ’09. ACM Press, New York, pp 1115–1122
Silva S, Vanneschi L (2012) Bloat free genetic programming: application to human oral bioavailability prediction. Int J Data Min Bioinform 6(6):585–601
Simulation Plus Inc. (2006) A company that use both statistical methods and differential equations based simulations for ADME parameter estimation. www.simulationsplus.com
Kennedy T (1997) Managing the drug discovery/development interface. Drug Discov Today 2:436–444
Tetko IV, Gasteiger J, Todeschini R, Mauri A, Livingstone D, Ertl P, Palyulin VA, Radchenko EV, Zefirov NS, Makarenko AS, Tanchuk VY, Prokopenko VV (2005) Virtual computational chemistry laboratory—design and description. J Comput Aided Mol Des 19:453–63. http://www.vcclab.org
Norinder U, Bergstrom CAS (2006) Prediction of ADMET properties. Chem Med Chem 1:920–937
Vanneschi L (2008) Investigating problem hardness of real life applications. In: Riolo R et al (eds) Genetic programming theory and practice V, genetic and evolutionary computation series. Springer, USA, pp 107–124
Vanneschi L, Castelli M, Manzoni L, Silva S (2013) A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. In: Krawiec K et al (eds) Proceedings of EuroGP LNCS. Springer, Berlin, pp 205–216
Vanneschi L, Castelli M, Silva S (2010) Measuring bloat, overfitting and functional complexity in genetic programming. In: Pelikan M, Branke J (eds) GECCO. ACM Press, New York, pp 877–884
Vanneschi L, Castelli M, Silva S (2014) A survey of semantic methods in genetic programming. Genet Program Evol Mach 1–20
Vanneschi L, Rochat D, Tomassini M (2007) Multi-optimization for generalization in symbolic regression using genetic programming. In: GN et al (ed) Proceedings of the second annual Italian workshop on artificial life and evolutionary computation (WIVACE 2007)
Vanneschi L, Silva S, Castelli M, Manzoni L (2013) Geometric semantic genetic programming for real life applications. In: Riolo R et al (eds) Genetic programming theory and practice XI, genetic and evolutionary computation. Springer, USA. Computer science collection, invited article (to appear)
Langdon WB, Barrett SJ (2004) Genetic programming in data mining for drug discovery. In: Evolutionary computing in data mining, pp 211–235
Acknowledgments
I sincerely thank all the collaborators that worked with me on this research track in the last decade. In particular, my heartfelt acknowledge goes to Sara Silva, Mauro Castelli, Luca Manzoni and Francesco Archetti. I also acknowledge project MassGP (PTDC/EEI-CTP/2975/2012), FCT (Portugal), for financial support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vanneschi, L. Improving genetic programming for the prediction of pharmacokinetic parameters. Memetic Comp. 6, 255–262 (2014). https://doi.org/10.1007/s12293-014-0143-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12293-014-0143-9