Abstract
The continuous molecular fields (CMF) approach is based on the application of continuous functions for the description of molecular fields instead of finite sets of molecular descriptors (such as interaction energies computed at grid nodes) commonly used for this purpose. These functions can be encapsulated into kernels and combined with kernel-based machine learning algorithms to provide a variety of novel methods for building classification and regression structure–activity models, visualizing chemical datasets and conducting virtual screening. In this article, the CMF approach is applied to building 3D-QSAR models for 8 datasets through the use of five types of molecular fields (the electrostatic, steric, hydrophobic, hydrogen-bond acceptor and donor ones), the linear convolution molecular kernel with the contribution of each atom approximated with a single isotropic Gaussian function, and the kernel ridge regression data analysis technique. It is shown that the CMF approach even in this simplest form provides either comparable or enhanced predictive performance in comparison with state-of-the-art 3D-QSAR methods.



Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Kubinyi H (ed) (2000) 3D QSAR in drug design: vol 1: theory methods and applications (three-dimensional quantitative structure activity relationships). Kluwer/Escom, Dordrecht
Kubinyi H, Folkers G, Martin YC (eds) (2002) 3D QSAR in drug design. Vol 2: ligand-protein interactions and molecular similarity, vol 2. Kluwer Academic Publishers, Dordrecht
Kubinyi H, Folkers G, Martin YC (eds) (2002) 3D QSAR in drug design. Vol 3: recent advances. Kluwer Academic Publishers, Dordrecht
Cruciani G (ed) (2006) Molecular interaction fields; application to drug discovery and ADME prediction. Wiley-VCH, Weinheim
Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA) 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110(18):5959–5967. doi:10.1021/ja00226a005
Testa B, Carrupt PA, Gaillard P, Billois F, Weber P (1996) Lipophilicity in molecular modeling. Pharm Res 13(3):335–343. doi:10.1023/a:1016024005429
Kim KH, Greco G, Novellino E, Silipo C, Vittoria A (1993) Use of the hydrogen bond potential function in a comparative molecular field analysis (CoMFA) on a set of benzodiazepines. J Comput Aided Mol Des 7(3):263–280
Waller CL, Marshall GR (1993) Three-dimensional quantitative structure-activity relationship of angiotesin-converting enzyme and thermolysin inhibitors. II. A comparison of CoMFA models incorporating molecular orbital fields and desolvation free energies based on active-analog and complementary-receptor-field alignment rules. J Med Chem 36(16):2390–2403
Kellogg GE (1996) E-state fields: applications to 3D QSAR. J Comput Aided Mol Des 10(6):513–520
Kroemer RT, Hecht P (1995) Replacement of steric 6–12 potential-derived interaction energies by atom-based indicator variables in CoMFA leads to models of higher consistency. J Comput Aided Mol Des 9(3):205–212
Klebe G, Abraham U (1999) Comparative molecular similarity index analysis (CoMSIA) to study hydrogen-bonding properties and to score combinatorial libraries. J Comput Aided Mol Des 13(1):1–10
Goodford P (2006) The basic principles of GRID. In: Cruciani G (ed) Molecular interaction fields. Applications in drug discovery and ADME prediction. Methods and principles in medicinal chemistry, vol 27. Wiley-VCH, Weinheim, pp 3–26
Höskuldsson A (1988) PLS regression methods. J Chemom 2(3):211–228
Martin RL, Gardiner E, Gillet VJ, Muñoz-Muriedas J, Senger S (2010) Wavelet approximation of GRID fields: application to quantitative structure-activity relationships. Mol Inform 29(8–9):603–620. doi:10.1002/minf.201000066
Tetko IV, Kovalishyn VV, Livingstone DJ (2001) Volume learning algorithm artificial neural networks for 3D QSAR studies. J Med Chem 44(15):2411–2420
Brown WM, Sasson A, Bellew DR, Hunsaker LA, Martin S, Leitao A, Deck LM, Vander Jagt DL, Oprea TI (2008) Efficient calculation of molecular properties from simulation using kernel molecular dynamics. J Chem Inf Model 48(8):1626–1637. doi:10.1021/ci8001233
Cheeseright T, Mackey M, Rose S, Vinter A (2006) Molecular field extrema as descriptors of biological activity: definition and validation. J Chem Inf Model 46(2):665–676. doi:10.1021/ci050357s
Carbo-Dorca R, Robert D, Amat L, Girones X, Besalu E (2000) Molecular quantum similarity in QSAR and drug design. Lecture notes in chemistry. Springer, Berlin
Fradera X, Amat L, Besalu E, Carbo-Dorca R (1997) Application of molecular quantum similarity to QSAR. Quant Struct Act Relat 16(1):25–32
Besalu E, Girones X, Amat L, Carbo-Dorca R (2002) Molecular quantum similarity and the fundamentals of QSAR. Acc Chem Res 35(5):289–295
Van Damme S, Bultinck P (2009) 3D QSAR based on conceptual DFT molecular fields: antituberculotic activity. J Mol Struct THEOCHEM 943(1–3):83–89. doi:10.1016/j.theochem.2009.10.031
Zhokhova NI, Baskin II, Bakhronov DK, Palyulin VA, Zefirov NS (2009) Method of continuous molecular fields in the search for quantitative structure-activity relationships. Dokl Chem 429(1):273–276
Karpov PV, Baskin II, Zhokhova NI, Zefirov NS (2011) Method of continuous molecular fields in the one-class classification task. Dokl Chem 440(2):263–265
Karpov PV, Baskin II, Zhokhova NI, Nawrozkij MB, Zefirov AN, Yablokov AS, Novakov IA, Zefirov NS (2011) One-class approach: models for virtual screening of non-nucleoside HIV-1 reverse transcriptase inhibitors based on the concept of continuous molecular fields. Russ Chem Bull 60(11):2418–2424
Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure-activity relationships. J Med Chem 47(22):5541–5554
Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Smola AJ, Scholkopf B, Muller KR (1998) The connection between regularization operators and support vector kernels. Neural Netw 11(4):637–649. doi:10.1016/s0893-6080(98)00032-x
Bennett KP, Embrechts MJ (2003) An optimization perspective on kernel partial least squares regression. In: Suykens JAK, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) Advances in learning theory: methods, models and applications. NATO science series III: computer and systems sciences, vol 190. IOS Press, Amsterdam, pp 227–250
Rasmussen CE, Williams CKI (2006) Gaussian processes in machine learning. Adaptive computation and machine learning. The MIT Press, Cambridge
Baskin II, Kireeva N, Varnek A (2010) The one-class classification approach to data description and to models applicability domain. Mol Inform 29(8–9):581–587. doi:10.1002/minf.201000063
Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, New York
Bader RFW (1985) Atoms in molecules. Acc Chem Res 18(1):9–15
Tripos Inc., St. Louis, MO. http://www.tripos.com
Artemenko NV, Baskin II, Palyulin VA, Zefirov NS (2001) Prediction of physical properties of organic compounds using artificial neural networks within the substructure approach. Dokl Chem 381(1):317–320
Artemenko NV, Baskin II, Palyulin VA, Zefirov NS (2003) Artificial neural network and fragmental approach in prediction of physicochemical properties of organic compounds. Russ Chem Bull 52(1):20–29
Jover J, Bosque R, Sales J (2004) Determination of Abraham solute parameters from molecular structure. J Chem Inf Comput Sci 44(3):1098–1106
Zhokhova NI, Baskin II, Palyulin VA, Zefirov AN, Zefirov NS (2007) Fragmental descriptors with labeled atoms and their application in QSAR/QSPR studies. Dokl Chem 417(2):282–284
Baskin II, Halberstam NM, Artemenko NV, Palyulin VA, Zefirov NS (2003) NASAWIN—a universal software for QSPR/QSAR studies. In: Ford M (ed) EuroQSAR 2002 designing drugs and crop protectants: processes, problems and solutions. Blackwell Publishing, Massachusetts, pp 260–263
Baskin II, Zhokhova NI, Palyulin VA, Zefirov AN, Zefirov NS (2009) Multilevel approach to the prediction of properties of organic compounds in the framework of the QSAR/QSPR methodology. Dokl Chem 427(1):172–175
Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69(7–9):730–742
Geisser S (1993) Predictive inference. Chapman and Hall, New York
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc 36:111–147
Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20(4):269–276
Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection Bias in performance evaluation. J Mach Learn Res 11:2079–2107
Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 48(9):1733–1746. doi:10.1021/ci800151m
R: A Language and Environment for Statistical Computing (2012). http://www.R-project.org/
DePriest SA, Mayer D, Naylor CB, Marshall GR (1993) 3D-QSAR of angiotensin-converting enzyme and thermolysin inhibitors: a comparison of CoMFA models based on deduced and experimentally determined active site geometries. J Am Chem Soc 115(13):5372–5384. doi:10.1021/ja00066a004
Gohlke H, Klebe G (2002) DrugScore meets CoMFA: adaptation of fields for molecular comparison (AFMoC) or how to tailor knowledge-based pair-potentials to a particular protein. J Med Chem 45(19):4153–4170. doi:10.1021/jm020808p
Böhm M, Stüjrzebecher J, Klebe G (1999) Three-dimensional quantitative structure-activity relationship analyses using comparative molecular field analysis and comparative molecular similarity indices analysis to elucidate selectivity differences of inhibitors binding to trypsin, thrombin, and factor Xa. J Med Chem 42(3):458–477. doi:10.1021/jm981062r
Besler BH, Merz KM, Kollman PA (1990) Atomic charges derived from semiempirical methods. J Comput Chem 11(4):431–439. doi:10.1002/jcc.540110404
Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity-a rapid access to atomic charges. Tetrahedron 36(22):3219–3228
Geerlings P, De Proft F, Langenaeker W (2003) Conceptual density functional theory. Chem Rev 103(5):1793–1874. doi:10.1021/cr990029p
Hamsici OC, Martinez AM (2009) Rotation invariant kernels and their application to shape analysis. IEEE Trans Pattern Anal 31(11):1985–1999. doi:10.1109/tpami.2008.234
Haasdonk B, Burkhardt H (2007) Invariant kernel functions for pattern analysis and machine learning. Mach Learn 68(1):35–61. doi:10.1007/s10994-007-5009-7
Wood J (1996) Invariant pattern recognition: a review. Pattern Recognit 29(1):1–17. doi:10.1016/0031-3203(95)00069-0
Erhan D, L’Heureux P-J, Yue SY, Bengio Y (2006) Collaborative filtering on a family of biological targets. J Chem Inf Model 46(2):626–635
Faulon J-L, Misra M, Martin S, Sale K, Sapra R (2008) Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor. Bioinformatics 24(2):225–233. doi:10.1093/bioinformatics/btm580
Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156
Geppert H, Humrich J, Stumpfe D, Gaertner T, Bajorath J (2009) Ligand prediction from protein sequence and small molecule information using support vector machines and fingerprint descriptors. J Chem Inf Model 49(4):767–779. doi:10.1021/ci900004a
Cawley GC, Talbot NLC (2007) Preventing over-fitting during model selection via bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861
Hall P, Robinson AP (2009) Reducing variability of cross validation for smoothing-parameter choice. Biometrika 96(1):175–186. doi:10.1093/biomet/asn068
Gönen M, Alpaydin E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Bishop CM (2006) Pattern recognition and machine learning. Information Science and Statistics, Springer
Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Model 52(6):1413–1437. doi:10.1021/ci200409x
Huang T-M, Kecman V, Kopriva I (2006) Kernel based algorithms for mining huge data sets. Supervised, semi-supervised, and unsupervised learning. Springer, Berlin
Acknowledgments
The authors thank Prof. Yu. A. Ustynyuk for stimulating discussion and advice. The authors also thank Prof. A. Varnek and Dr. G. Marcou for valuable comments regarding the developed approach. This work was supported by Russian Foundation for Basic Research (Grant 13-07-00511).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Baskin, I.I., Zhokhova, N.I. The continuous molecular fields approach to building 3D-QSAR models. J Comput Aided Mol Des 27, 427–442 (2013). https://doi.org/10.1007/s10822-013-9656-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-013-9656-4
Keywords
Profiles
- Igor I. Baskin View author profile