Abstract
Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of candidate poses. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited docking power (or ability to successfully identify the correct pose) has been a major impediment to cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with physicochemical and geometrical features characterizing protein-ligand complexes to predict the native or near-native pose of a ligand docked to a receptor protein’s binding site. We assess the docking accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 PDBbind benchmark datasets on both diverse and homogeneous (protein-family-specific) test sets. We find that the best performing ML SF has a success rate of 80 % in identifying poses that are within 1 Å root-mean-square deviation from the native poses of 65 different protein families. This is in comparison to a success rate of only 70 % achieved by the best conventional SF, ASP, employed in the commercial docking software GOLD. We also observed steady gains in the performance of the proposed ML SFs as the training set size was increased by considering more protein-ligand complexes and/or more computationally-generated poses for each complex.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lyne, P.D.: Structure-based virtual screening: an overview. Drug Discov. Today 7(20), 1047–1055 (2002)
Cheng, T., Li, X., Li, Y., Liu, Z., Wang, R.: Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 49(4), 1079–1093 (2009)
Ashtawy, H.M., Mahapatra, N.R.: A comparative assessment of conventional and machine-learning-based scoring functions in predicting binding affinities of protein-ligand complexes. In: 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 627–630. IEEE (2011)
Ashtawy, H.M., Mahapatra, N.R.: A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein-ligand binding affinity prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 9(5), 1301–1313 (2012)
Ewing, T., Makino, S., Skillman, A., Kuntz, I.: Dock 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15(5), 411–428 (2001)
Wang, R., Lai, L., Wang, S.: Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput. Aided Mol. Des. 16, 11–26 (2002). doi:10.1023/A:1016357811882
Gohlke, H., Hendlich, M., Klebe, G.: Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol. 295(2), 337–356 (2000)
Mooij, W., Verdonk, M.: General and targeted statistical potentials for protein-ligand interactions. Proteins 61(2), 272 (2005)
Jones, G., Willett, P., Glen, R., Leach, A., Taylor, R.: Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267(3), 727–748 (1997)
Gehlhaar, D.K., Verkhivker, G.M., Rejto, P.A., Sherman, C.J., Fogel, D.R., Fogel, L.J., Freer, S.T.: Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming. Chem. Biol. 2(5), 317–324 (1995)
Inc., A.S.: The Discovery Studio Software, San Diego, CA (2001) (version 2.0)
Velec, H.F.G., Gohlke, H., Klebe, G.: DrugScore CSD - knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J. Med. Chem. 48(20), 6296–6303 (2005)
Venkatachalam, C., Jiang, X., Oldfield, T., Waldman, M.: LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites. J. Mol. Graph. Model. 21(4), 289–307 (2003)
Jain, A.: Surflex-dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search. J. Comput. Aided Mol. Des. 21(5), 281–306 (2007)
Rarey, M., Kramer, B., Lengauer, T., Klebe, G.: A fast flexible docking method using an incremental construction algorithm. J. Mol. Biol. 261(3), 470–489 (1996)
Wang, R., Fang, X., Lu, Y., Wang, S.: The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 47(12), 2977–2980 (2004)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)
Madden, T.: The BLAST sequence analysis tool. In: McEntyre, J., Ostell, J. (eds.) The NCBI Handbook. National Library of Medicine (US), National Center for Biotechnology Information, Bethesda (2002)
Schnecke, V., Kuhn, L.A.: Virtual screening with solvation and ligand-induced complementarity. In: Klebe, G. (ed.) Virtual Screening: An Alternative or Complement to High Throughput Screening?, pp. 171–190. Springer, Amsterdam (2002)
Ballester, P., Mitchell, J.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9), 1169 (2010)
BioSolveIT.: LeadIT, St. Augustin, Germany (2012) (version 2.1)
Inc., T.: The SYBYL Software, 1699 South Hanley Rd., St. Louis, Missouri, 63144, USA (2006) (version 7.2)
Schrödinger, L.: The Schrödinger Software, New York (2005) (version 8.0)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)
Team, R.D.C.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2010) ISBN 3-900051-07-0
Milborrow, S., Trevor, H., Tibshirani, R.: earth: Multivariate Adaptive Regression Spline Models (2010) (R package version 2.4-5)
Hechenbichler, K.S.K.: kknn: Weighted k-Nearest Neighbors (2010) (R package version 1.0-8)
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Miscellaneous Functions of the Department of Statistics (e1071), TU Wien (2010) (R package version 1.5-24)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Ridgeway, G.: gbm: Generalized Boosted Regression Models (2010) (R package version 1.6-3.1)
Overington, J., Al-Lazikani, B., Hopkins, A.: How many drug targets are there? Nat. Rev. Drug Discovery 5(12), 993–996 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ashtawy, H.M., Mahapatra, N.R. (2014). Molecular Docking for Drug Discovery: Machine-Learning Approaches for Native Pose Prediction of Protein-Ligand Complexes. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-09042-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09041-2
Online ISBN: 978-3-319-09042-9
eBook Packages: Computer ScienceComputer Science (R0)