Summary
Structure-based screening using fully flexible docking is still too slow for large molecular libraries. High quality docking of a million molecule library can take days even on a cluster with hundreds of CPUs. This performance issue prohibits the use of fully flexible docking in the design of large combinatorial libraries. We have developed a fast structure-based screening method, which utilizes docking of a limited number of compounds to build a 2D QSAR model used to rapidly score the rest of the database. We compare here a model based on radial basis functions and a Bayesian categorization model. The number of compounds that need to be actually docked depends on the number of docking hits found. In our case studies reasonable quality models are built after docking of the number of molecules containing 50 docking hits. The rest of the library is screened by the QSAR model. Optionally a fraction of the QSAR-prioritized library can be docked in order to find the true docking hits. The quality of the model only depends on the training set size – not on the size of the library to be screened. Therefore, for larger libraries the method yields higher gain in speed no change in performance. Prioritizing a large library with these models provides a significant enrichment with docking hits: it attains the values of 13 and 35 at the beginning of the score-sorted libraries in our two case studies: screening of the NCI collection and a combinatorial libraries on CDK2 kinase structure. With such enrichments, only a fraction of the database must actually be docked to find many of the true hits. The throughput of the method allows its use in screening of large compound collections and in the design of large combinatorial libraries. The strategy proposed has an important effect on efficiency but does not affect retrieval of actives, the latter being determined by the quality of the docking method itself.
Similar content being viewed by others
References
Stahura F.L., Bajorath J., (2004) Comb. Chem. High Throughput Screen 7:259
Bajorath J., (2002) Nat. Re.v Drug Discov. 1: 882
Engels M.F., Venkatarangan P., (2001) Curr. Opin. Drug Discov. Develop. 4: 275
Kuntz I.D., Blaney J.M., Oatley S.J., Langridge R., Ferrin T.E., (1982) J. Mol. Biol. 161: 269
Tatsumi R., Fukunishi Y., Nakamura H., (2004) J. Comput. Chem. 25: 1995
Holtje H.D., (1974) Arch. Pharm. (Weinheim) 307: 969
Steindl T., Langer T., (2004) J. Chem. Inf. Comput. Sci. 44: 1849
Eksterowicz J.E., Evensen E., Lemmen C., Brady G.P., Lanctot J.K., Bradley E.K., Saiah E., Robinson L.A., Grootenhuis P.D., Blaney J.M., (2002) J. Mol. Graph. Model. 20: 469
Smellie A., Kahn S.D., Teig S., (1995) J. Chem. Inf. Comput. Sci. 35: 285
Hurst T., (1994) J. Chem. Inf. Comput. Sci. 34: 190
Sprous D.G., Lowis D.R., Leonard J.M., Heritage T., Burkett S.N., Baker D.S., Clark R.D., (2004) J. Comb. Chem. 6: 530
Makino S., Ewing T.J., Kuntz I.D., (1999) J. Comput. Aided Mol. Des. 13: 513
Sun Y., Ewing T.J., Skillman A.G., Kuntz I.D., (1998) J. Comput. Aided Mol. Des. 12: 597
Lamb M.L., Burdick K.W., Toba S., Young M.M., Skillman A.G., Zou X., Arnold J.R., Kuntz I.D., (2001) Proteins 42: 296
Kick E.K., Roe D.C., Skillman A.G., Liu G., Ewing T.J., Sun Y., Kuntz I.D., Ellman J.A., (1997) Chem. Biol. 4: 297
Pipeline Pilot V 3.5, Scitegic Inc., (2004) San Diego
Buhmann, M.D., Radial Basis Functions: Theory and Implementations, Cambridge University Press, 2003
Klon A.E., Glick M., Thoma M., Acklin P., Davies J.W., (2004) J. Med. Chem. 47: 2743
Kellenberger E., Rodrigo J., Muller P., Rognan D., (2004) Proteins 57: 225
Jacobsson M., Liden P., Stjernschantz E., Bostrom H., Narinder V., 2003 J. Med. Chem., 46: 5781
Klon A.E., Glick M., Davies J.W. 2004 J. Chem. Inf. Comput. Sci. 44: 2216
Klon A.E., Glick M., Davies J.W., 2004 J. Med. Chem. 47: 4356
Bender A., Mussa H.Y., Glen R.C., Reiling S., 2004 J. Chem. Inf. Comput. Sci. 44: 170
Schapira M., Abagyan R., Totrov M., (2003) J. Med. Chem. 46: 3045
Abagyan R., Orry A., 2004. ICM User’s Guide MolSoft, L.L.C. La Jolla
http://www.rcsb.org/pdb/
http :// dtp.nci.nih.gov/docs/3d_database/structural_information/structural_data.html
Pearlman, R.S. and Kubinyi H. (Eds.), 3D Molecular Structures: Generation and Use in 3D-Searching, ESCOM Science Publishers Leiden, 1993, p. 21
Chang Y.T., Gray N.S., Rosania G.R., Sutherlin D.P., Kwon S., Norman T.C., Sarohia R., Leost M., Meijer L., Schultz P.G., (1999) Chem. Biol. 6: 361
Available Chemicals Directory, Elsevier MDL, San Leandro, 2004
Hann M., Hudson B., Lewell X., Lifely R., Miller L., Ramsden N., (1999) J. Chem. Inf. Comput. Sci. 39: 897
Butina D., (1999) J. Chem. Inf. Comput. Sci. 39: 747
Li, D., MapMaker: an integrated compound library design tool, Philadelphia, 2004, August 22–26
Blair R.M., Fang H., Branham W.S., Hass B.S., Dial S.L., Moland C.L., Tong W., Shi L., Perkins R., Sheehan D.M., (2000) Toxicol. Sci. 54: 138
Fang H., Tong W., Shi L.M., Blair R., Perkins R., Branham W., Hass B.S., Xie Q., Dial S.L., Moland C.L., Sheehan D.M., (2001) Chem. Res. Toxicol. 14: 280
Rogers, D., Multicriteria Modeling: The Next Stage in Handling Large Data Sets, Anaheim, 2004, March 27–April 1
MDL Information Systems, Inc., 14600 Catalina Street, San Leandro, CA 94577
Ghose A.K., Crippen G.M., (1986) J Comput. Chem. 7: 565
Ghose A.K., Pritchett A., Crippen G.M., (1988) J. Chem. Inf. Comput. Sci. 9: 80
Pipeline Pilot V 3.5. User Manual; section “Extended Connectivity Fingerprints”, Scitegic Inc., San Diego, 2004
Daylight Theory User Manual; section “Fingerprints - Screening and Similarity”, Daylight Chemical Information Systems, Inc., Mission Viejo, 2004
Bayes T., (1958) Biometrika 45: 296
Xu H., Agrafiotis D.K., (2003) J. Chem. Inf. Comput. Sci. 43: 1933
Dongarra, J.J., LINPACK, http://www.netlib.org/linpack/, (1988)
Hand D., Mannila H., Smyth P., 2001. Principles of Data Mining The MIT Press Cambridge, Massachsetts
Pearlman D.A., Charifson P.S., (2001) J. Med. Chem. 44: 502
De Borda J., 1781 Memoire sur les elections au scrutin historie de l’academie royale des sciences Paris
Breiman, Freidman, Olshen and Stone, 1984. Classification and Regression Trees, Wadsworth
Wold, H. and Gani, J. (Ed.), The PLS Approach, in Perspectives in Probability and Statistics, Academic Press London, 1975
Aleksander, I. and Morton, H., 1995. An introduction to Neural Computing, Chapman and Hall
Back T., 1996. Evolutionary Algorithms in Theory and Practice – Evolution Strategies, Evolutionary Programming, Genetic Algorithms Oxford University Press New York, Oxford
Acknowledgement
The authors gratefully acknowledge Dr Weida Tong (National Center for Toxicological Research, U.S. Food & Drug Administration, Jefferson, AR) for providing us the experimental data set and 2D structures of Estrogen Receptor compounds used in this study.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Yoon, S., Smellie, A., Hartsough, D. et al. Surrogate docking: structure-based virtual screening at high throughput speed. J Comput Aided Mol Des 19, 483–497 (2005). https://doi.org/10.1007/s10822-005-9002-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-005-9002-6