Abstract
In this work, computational compound screening strategies on the basis of two- and three-dimensional (2D and 3D) molecular representations were investigated including similarity searching and support vector machine (SVM) ranking. Calculations based on topological fingerprints and molecular shape queries and features were compared. A unique aspect of the analysis setting apart from previous comparisons of 2D and 3D virtual screening approaches has been the design of compound reference, training, and test data sets with controlled incremental increases in intra-set structural diversity and different categories of structural relationships between reference/training and test sets. The use of these data sets made it possible to assess the relative performance of 2D and 3D screening strategies under increasingly challenging conditions ultimately leading to the use of training and test sets with essentially unrelated structures. The results showed that 3D similarity searching had little advantage over 2D searching in identifying active compounds with remote structural relationships. However, 3D SVM models trained on the basis of shape features were superior to other approaches (including 2D SVM) when the detection of structure–activity relationships became increasingly challenging. Such 3D SVM methods has thus far only been little investigated in virtual screening, proving a wealth of opportunities for further analyses.
Similar content being viewed by others
References
Geppert H, Vogt M, Bajorath J (2010) Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model 50:205–216
Eckert H, Bajorath J (2007) Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today 12:225–233
Kearnes S, McCloskey K, Berndl M, Pande V, Riley P (2016) Molecular graph convolutions: moving beyond fingerprints. J Comput Aided Mol Des 30:595–608
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754
Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2017) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530
Wolber G, Langer T (2005) LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J Chem Inf Model 45:160–169
Hu G, Kuang G, Xiao W, Li W, Liu G, Tang Y (2012) Performance evaluation of 2D fingerprint and 3D shape similarity methods in virtual screening. J Chem Inf Model 52:1103–1113
Cramer RD, Patterson DE, Bunce JD (1998) Comparative molecular field analysis (CoMFA) 1 effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110:5959–5967
Schneider G, Schneider P, Renner S (2006) Scaffold-hopping: how far can you jump? QSAR Comb Sci 25:1162–1171
Schneider G, Neidhart W, Giller T, Schmid G (1999) “Scaffold-Hopping” by topological pharmacophore search: a contribution to virtual screening. Angew Chemie Int Ed 38:2894–2896
Grisoni F, Merk D, Byrne R, Schneider G (2018) Scaffold-Hopping from synthetic drugs by holistic molecular representation. Sci Rep 8:16469
Rush TS, Grant JA, Mosyak L, Nicholls A (2005) A shape-based 3-D Scaffold Hopping method and its application to a bacterial protein−protein interaction. J Med Chem 48:1489–1495
Naylor E, Arredouani A, Vasudevan SR, Lewis AM, Parkesh R, Mizote A, Rosen D, Thomas JM, Izumi M, Ganesan A, Galione A, Churchill GC (2009) Identification of a chemical probe for NAADP by virtual screening. Nat Chem Biol 5:220–226
ROCS version 3.2.2.2; OpenEye Scientific Software Inc, Santa Fe, NM
Hawkins PCD, Skillman AG, Nicholls A (2007) Comparison of shape-matching and docking as virtual screening tools. J Med Chem 50:74–82
Kearnes S, Pande V (2016) ROCS-derived features for virtual screening. J Comput Aided Mol Des 30:609–617
Sato T, Yuki H, Takaya D, Sasaki S, Tanaka A, Honma T (2012) Application of support vector machine to three-dimensional shape-based virtual screening using comprehensive three-dimensional molecular shape overlay with known inhibitors. J Chem Inf Model 52:1015–1026
Hu B, Kuang Z-K, Feng S-Y, Wang D, He S-B, Kong D-X (2016) Three-dimensional biologically relevant spectrum (BRS-3D): shape similarity profile based on PDB ligands as molecular descriptors. Molecules 21:e1554
Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Krüger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Papadatos G, Santos R, Overington JP (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res 42:D1083–D1090
Naveja JJ, Vogt M, Stumpfe D, Medina-Franco JL, Bajorath J (2019) Systematic extraction of analogue series from large compound collections using a new computational compound–core relationship method. ACS Omega 4:1027–1032
Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 52:1757–1768
Jones E, Oliphant T, Peterson P, others {SciPy}: Open Source Scientific Tools for {Python} http://www.scipy.org. Accessed June 5 2019
OEChem TK Version 2.1.5; OpenEye Scientific Software Inc, Santa, Fe, NM
Molecular Operating Environment (MOE) 2019.01; Chemical Computing Group ULC: 1010 Sherbooke St West Suite #910 Montreal QC Canada H3A 2R7
Halgren TA (1999) MMFF VI MMFF94s option for energy minimization studies. J Comput Chem 20:720–729
OEOmega TK Version 2.8.0; OpenEye Scientificc Software Inc, Santa Fe, NM
Kirchmair J, Distinto S, Markt P, Schuster D, Spitzer GM, Liedl KR, Wolber G (2009) How to optimize shape-based virtual screening: choosing the right query and including chemical information. J Chem Inf Model 49:678–692
Miyao T, Bajorath J (2018) Exploring ensembles of bioactive or virtual analogs of X-ray ligands for shape similarity searching. J Comput Aided Mol Des 32:759–767
Vapnik VN (2000) The nature of statistical learning theory. Springer-Verlag, New York
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory—COLT ’92 ACM Press, New York, pp 144–152
Ralaivola L, Swamidass SJ, Saigo H, Baldi P (2005) Graph Kernels for chemical informatics. Neural Netw 18:1093–1110
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-Learn: machine learning in python. J Mach Learn Res 12:2825–2830
Good AC, Hermsmeier MA, Hindle SA (2004) Measuring CAMD technique performance: a virtual screening case study in the design of validation experiments. J Comput Aided Mol Des 18:529–536
Acknowledgements
We thank OpenEye Scientific Software, Inc., for providing a free academic license of the OpenEye chemistry toolkits.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Miyao, T., Jasial, S., Bajorath, J. et al. Evaluation of different virtual screening strategies on the basis of compound sets with characteristic core distributions and dissimilarity relationships. J Comput Aided Mol Des 33, 729–743 (2019). https://doi.org/10.1007/s10822-019-00218-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-019-00218-8