Molecular Docking for Drug Discovery: Machine-Learning Approaches for Native Pose Prediction of Protein-Ligand Complexes

Ashtawy, Hossam M.; Mahapatra, Nihar R.

doi:10.1007/978-3-319-09042-9_2

Hossam M. Ashtawy⁷ &
Nihar R. Mahapatra⁷

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8452))

Included in the following conference series:

International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics

1216 Accesses
2 Altmetric

Abstract

Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of candidate poses. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited docking power (or ability to successfully identify the correct pose) has been a major impediment to cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with physicochemical and geometrical features characterizing protein-ligand complexes to predict the native or near-native pose of a ligand docked to a receptor protein’s binding site. We assess the docking accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 PDBbind benchmark datasets on both diverse and homogeneous (protein-family-specific) test sets. We find that the best performing ML SF has a success rate of 80 % in identifying poses that are within 1 Å root-mean-square deviation from the native poses of 65 different protein families. This is in comparison to a success rate of only 70 % achieved by the best conventional SF, ASP, employed in the commercial docking software GOLD. We also observed steady gains in the performance of the proposed ML SFs as the training set size was increased by considering more protein-ligand complexes and/or more computationally-generated poses for each complex.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins

Article Open access 17 April 2015

The Importance of the Regression Model in the Structure-Based Prediction of Protein-Ligand Binding

Performance of machine-learning scoring functions in structure-based virtual screening

Article Open access 25 April 2017

References

Lyne, P.D.: Structure-based virtual screening: an overview. Drug Discov. Today 7(20), 1047–1055 (2002)
Article Google Scholar
Cheng, T., Li, X., Li, Y., Liu, Z., Wang, R.: Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 49(4), 1079–1093 (2009)
Article Google Scholar
Ashtawy, H.M., Mahapatra, N.R.: A comparative assessment of conventional and machine-learning-based scoring functions in predicting binding affinities of protein-ligand complexes. In: 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 627–630. IEEE (2011)
Google Scholar
Ashtawy, H.M., Mahapatra, N.R.: A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein-ligand binding affinity prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 9(5), 1301–1313 (2012)
Article Google Scholar
Ewing, T., Makino, S., Skillman, A., Kuntz, I.: Dock 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15(5), 411–428 (2001)
Article Google Scholar
Wang, R., Lai, L., Wang, S.: Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput. Aided Mol. Des. 16, 11–26 (2002). doi:10.1023/A:1016357811882
Article Google Scholar
Gohlke, H., Hendlich, M., Klebe, G.: Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol. 295(2), 337–356 (2000)
Article Google Scholar
Mooij, W., Verdonk, M.: General and targeted statistical potentials for protein-ligand interactions. Proteins 61(2), 272 (2005)
Article Google Scholar
Jones, G., Willett, P., Glen, R., Leach, A., Taylor, R.: Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267(3), 727–748 (1997)
Article Google Scholar
Gehlhaar, D.K., Verkhivker, G.M., Rejto, P.A., Sherman, C.J., Fogel, D.R., Fogel, L.J., Freer, S.T.: Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming. Chem. Biol. 2(5), 317–324 (1995)
Google Scholar
Inc., A.S.: The Discovery Studio Software, San Diego, CA (2001) (version 2.0)
Google Scholar
Velec, H.F.G., Gohlke, H., Klebe, G.: DrugScore CSD - knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J. Med. Chem. 48(20), 6296–6303 (2005)
Article Google Scholar
Venkatachalam, C., Jiang, X., Oldfield, T., Waldman, M.: LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites. J. Mol. Graph. Model. 21(4), 289–307 (2003)
Article Google Scholar
Jain, A.: Surflex-dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search. J. Comput. Aided Mol. Des. 21(5), 281–306 (2007)
Article Google Scholar
Rarey, M., Kramer, B., Lengauer, T., Klebe, G.: A fast flexible docking method using an incremental construction algorithm. J. Mol. Biol. 261(3), 470–489 (1996)
Article Google Scholar
Wang, R., Fang, X., Lu, Y., Wang, S.: The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 47(12), 2977–2980 (2004)
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)
Article Google Scholar
Madden, T.: The BLAST sequence analysis tool. In: McEntyre, J., Ostell, J. (eds.) The NCBI Handbook. National Library of Medicine (US), National Center for Biotechnology Information, Bethesda (2002)
Google Scholar
Schnecke, V., Kuhn, L.A.: Virtual screening with solvation and ligand-induced complementarity. In: Klebe, G. (ed.) Virtual Screening: An Alternative or Complement to High Throughput Screening?, pp. 171–190. Springer, Amsterdam (2002)
Google Scholar
Ballester, P., Mitchell, J.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9), 1169 (2010)
Article Google Scholar
BioSolveIT.: LeadIT, St. Augustin, Germany (2012) (version 2.1)
Google Scholar
Inc., T.: The SYBYL Software, 1699 South Hanley Rd., St. Louis, Missouri, 63144, USA (2006) (version 7.2)
Google Scholar
Schrödinger, L.: The Schrödinger Software, New York (2005) (version 8.0)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)
Book MATH Google Scholar
Team, R.D.C.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2010) ISBN 3-900051-07-0
Google Scholar
Milborrow, S., Trevor, H., Tibshirani, R.: earth: Multivariate Adaptive Regression Spline Models (2010) (R package version 2.4-5)
Google Scholar
Hechenbichler, K.S.K.: kknn: Weighted k-Nearest Neighbors (2010) (R package version 1.0-8)
Google Scholar
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Miscellaneous Functions of the Department of Statistics (e1071), TU Wien (2010) (R package version 1.5-24)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Ridgeway, G.: gbm: Generalized Boosted Regression Models (2010) (R package version 1.6-3.1)
Google Scholar
Overington, J., Al-Lazikani, B., Hopkins, A.: How many drug targets are there? Nat. Rev. Drug Discovery 5(12), 993–996 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824, USA
Hossam M. Ashtawy & Nihar R. Mahapatra

Authors

Hossam M. Ashtawy
View author publications
You can also search for this author in PubMed Google Scholar
Nihar R. Mahapatra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nihar R. Mahapatra .

Editor information

Editors and Affiliations

University Nice Sophia Antipolis, Sophia Antipolis, France
Enrico Formenti
University of Salerno, Fisciano, Italy
Roberto Tagliaferri
University of Groningen, AG Groningen, The Netherlands
Ernst Wit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashtawy, H.M., Mahapatra, N.R. (2014). Molecular Docking for Drug Discovery: Machine-Learning Approaches for Native Pose Prediction of Protein-Ligand Complexes. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-09042-9_2
Published: 16 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09041-2
Online ISBN: 978-3-319-09042-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics