Abstract
In this article, we present an automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We describe the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We apply this automatic process to data sets of blood–brain barrier penetration and aqueous solubility and compare the resulting automatically generated models with ‘manually’ built models using external test sets. The results demonstrate the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.






Similar content being viewed by others
Notes
Auto-ModelerTM is a trademark of Galapagos NV and/or its affiliates.
References
Cartmell J, Enoch S, Krstajic D, Leahy DE (2005) J Comput Aid Mol Des 19:821
Zhang S, Golbraikh A, Oloff S, Kohn H, Tropsha A (2006) J Chem Inf Model 46:1984
Winkler DA, Burden FR (2004) J Mol Graph Model 22:499
Tetko IV (2002) J Chem Inf Comput Sci 42:717
Burden FR (2001) J Chem Inf Comput Sci 41:830
Obrezanova O, Csányi G, Gola JMR, Segall MD (2007) J Chem Inf Model 47:1847
Schwaighofer A, Schroeter T, Mika S, Laub J, Laak AT, Sulzle D, Ganzer U, Heinrich N, Muller KR (2007) J Chem Inf Model 47:407
Daylight Chemical Information Systems, Inc., SMARTS Tutorial. Retrieved from http://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html 16/10/2007
Ertl P, Rhodes B, Selzer P (2000) J Med Chem 43:3714
Abraham MH, McGowan JC (1987) Chromatographia 23:243
Butina D (1999) J Chem Inf Comput Sci 39:747
Livingstone D (1995) Data analysis for chemists. Oxford University Press, Oxford, UK
Wold S, Sjöström M, Eriksson L (1998) In: Schleyer PvR, Allinger NL, Clark T, Gasteiger J, Kollman P, Schaefer HF III, Schreiner PR (eds) The encyclopedia of computational chemistry, vol 3. Wiley, Chichester UK, pp 2006–2022
Enot D, Gautier R, Le Marouille J (2001) SAR QSAR Environ Res 12:461
Tino P, Nabney IT, Williams BS, Losel J, Sun Y (2004) J Chem Inf Comput Sci 44:1647
Schroeter T, Schwaighofer A, Mika S, Laak AT, Sulzle D, Ganzer U, Heinrich N, Muller KR (2007) J Comput Aided Mol Des 21:485
MacKay DJC (2003) Information theory, inference, and learning algorithms. Cambridge University Press, Cambridge, UK
Rasmussen CE, Williams CKI (2006) Gaussian Processes for machine learning. The MIT Press, Cambridge, MA
Buhman MD (2003) Radial basis functions: theory and implementations. Cambridge University Press, Cambridge, UK
Whitley DC, Ford MG, Livingstone DJ (2000) J Chem Inf Comput Sci 40:1160
Clark DE (2005) In: Doherty AM (ed) Annual reports in medicinal chemistry, vol 40. Elsevier Academic Press, San Diego, CA, pp 403–415
Butina D, Gola JRM (2003) J Chem Inf Comput Sci 43:837
Abraham MH, Ibrahim A, Zhao Y, Acree WE Jr (2006) J Pharm Sci 95:2091
Huuskonen J (2000) J Chem Inf Comput Sci 40:773
Rose K, Hall LH, Kier LB (2002) J Chem Inf Comput Sci 42:651
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Obrezanova, O., Gola, J.M.R., Champness, E.J. et al. Automatic QSAR modeling of ADME properties: blood–brain barrier penetration and aqueous solubility. J Comput Aided Mol Des 22, 431–440 (2008). https://doi.org/10.1007/s10822-008-9193-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-008-9193-8