Abstract
Scoring functions are routinely deployed in structure-based drug design to quantify the potential for protein–ligand (PL) complex formation. Here, we present a new scoring function Bappl+ that is designed to predict the binding affinities of non-metallo and metallo PL complexes. Bappl+ outperforms other state-of-the-art scoring functions, achieving a high Pearson correlation coefficient of up to ~ 0.76 with low standard deviations. The biggest contributors to the increased performance are the use of a machine-learning model and the enlarged training dataset. We have also evaluated the performance of Bappl+ on target-specific proteins, which highlighted the limitations of our function and provides a way for further improvements. We believe that Bappl+ methodology could prove valuable in ranking candidate molecules against a target metallo or non-metallo protein by reliably predicting their binding affinities, thus helping in the drug discovery process.





Similar content being viewed by others
References
Schulz-Gasch T, Stahl M (2004) Scoring functions for protein–ligand interactions: a critical perspective. Drug Discov Today Technol 1:231–239
Böhm H-J, Stahl M (2003) The use of scoring functions in drug discovery applications. In: Lipkowitz KB, Boyd DB (eds) Reviews in computational chemistry, vol 18. Wiley, Hoboken, pp 41–87
Leach AR, Shoichet BK, Peishoff CE (2006) Prediction of protein−ligand interactions. docking and scoring: successes and gaps. J Med Chem 49:5851–5855
Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov 3:935–949
Huang S-Y, Grinter SZ, Zou X (2010) Scoring functions and their evaluation methods for protein–ligand docking: recent advances and future directions. Phys Chem Chem Phys 12:12899–12908
Meng EC, Shoichet BK, Kuntz ID (1992) Automated docking with grid-based energy evaluation. J Comput Chem 13:505–524
Jones G, Willett P, Glen RC et al (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267:727–748
Morris GM, Goodsell DS, Halliday RS et al (1998) Automated docking using a lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19:1639–1662
Ewing TJA, Makino S, Skillman AG, Kuntz ID (2001) DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J Comput Aided Mol Des 15:411–428
Pason LP, Sotriffer CA (2016) Empirical scoring functions for affinity prediction of protein–ligand complexes. Mol Inform 35:541–548
Wang R, Lai L, Wang S (2002) Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput Aided Mol Des 16:11–26
Molecular Operating Environment (MOE), version 2016; Chemical Computing Group Inc.: Montreal, QC, Canada (2016)
Li Y, Liu Z, Li J et al (2014) Comparative assessment of scoring functions on an updated benchmark: 1: Compilation of the test set. J Chem Inf Model 54:1700–1716
Thornton BF, Wik M, Crill PM (2016) Double-counting challenges the accuracy of high-latitude methane inventories. Geophys Res Lett 43:12569–12577
Verkhivker G, Appelt K, Freer ST, Villafranca JE (1995) Empirical free energy calculations of ligand-protein crystallographic complexes: I: Knowledge-based ligand-protein interaction potentials applied to the prediction of human immunodeficiency virus 1 protease binding affinity. Protein Eng Des Sel 8:677–691
Krammer A, Kirchhoff PD, Jiang X et al (2005) LigScore: a novel scoring function for predicting binding affinities. J Mol Graph Model 23:395–407
Böhm HJ (1998) Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs. J Comput Aided Mol Des 12:309–323
Jain A (1996) Scoring noncovalent protein–ligand interactions: a continuous differentiable function tuned to compute binding affinities. J Comput Aided Mol Des 10:427–440
Eldridge MD, Murray CW, Auton TR et al (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aided Mol Des 11:425–445
Baxter CA, Murray CW, Clark DE et al (1998) Flexible docking using Tabu search and an empirical estimate of binding affinity. Proteins Struct Funct Genet 33:367–382
Friesner RA, Banks JL, Murphy RB et al (2004) Glide: a new approach for rapid, accurate docking and scoring: 1: Method and assessment of docking accuracy. J Med Chem 47:1739–1749
Friesner RA, Murphy RB, Repasky MP et al (2006) Extra precision glide: Docking and scoring incorporating a model of hydrophobic enclosure for protein–ligand complexes. J Med Chem 49:6177–6196
Jain T, Jayaram B (2005) An all atom energy based computational protocol for predicting binding affinities of protein–ligand complexes. FEBS Lett 579:6659–6666
Gohlke H, Hendlich M, Klebe G (2000) Knowledge-based scoring function to predict protein–ligand interactions. J Mol Biol 295:337–356
Grzybowski BA, Ishchenko AV, Shimada J, Shakhnovich EI (2002) From knowledge-based potentials to combinatorial lead design in silico. Acc Chem Res 35:261–269
McQuarrie DA (1976) Statistical Mechanics
Chandler D, Percus JK (1987) Introduction to modern statistical mechanics. Oxford Univ Press, New York, doi 10(1063/1):2811680
Huang S-Y, Zou X (2010) Advances and challenges in protein–ligand docking. Int J Mol Sci 11:3016–3034
Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55:475–482
Zheng Z, Merz KM (2013) Development of the knowledge-based and empirical combined scoring algorithm (KECSA) to score protein–ligand interactions. J Chem Inf Model 53:1073–1083
Velec HFG, Gohlke H, Klebe G (2005) DrugScoreCSD-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J Med Chem 48:6296–6303
Muegge I, Martin YC (1999) A general and fast scoring function for protein–ligand interactions: a simplified potential approach. J Med Chem 42:791–804
Muegge I (2000) A knowledge-based scoring function for protein–ligand interactions: probing the reference state. Perspect Drug Discov Des 20:99–114
Muegge I (2006) PMF scoring revisited. J Med Chem 49:5895–5902
Mooij WTM, Verdonk ML (2005) General and targeted statistical potentials for protein–ligand interactions. Proteins Struct Funct Genet 61:272–287
DeWitte RS, Shakhnovich EI (1996) SMoG: De novo design method based on simple, fast, and accurate free energy estimates: 1: Methodology and supporting evidence. J Am Chem Soc 118:11733–11744
Debroise T, Shakhnovich EI, Chéron N (2017) A Hybrid Knowledge-Based and Empirical Scoring Function for Protein-Ligand Interaction: SMoG2016. J Chem Inf Model 57:584–593
Huang SY, Zou X (2006) An iterative knowledge-based scoring function to predict protein–ligand interactions: I: Derivation of interaction potentials. J Comput Chem 27:1866–1875
Baum B, Muley L, Smolinski M et al (2010) Non-additivity of functional group contributions in protein–ligand binding: a comprehensive study by crystallography and isothermal titration calorimetry. J Mol Biol 397:1042–1054
Cheng T, Li Q, Zhou Z et al (2012) Structure-based virtual screening for drug discovery: a problem-centric review. AAPS J 14:133–141
Ballester PJ, Mitchell JBO (2010) A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics 26:1169–1175
Ballester PJ, Schreyer A, Blundell TL (2014) Does a more precise chemical description of protein–ligand complexes lead to more accurate prediction of binding affinity? J Chem Inf Model 54:944–955
Li GB, Yang LL, Wang WJ et al (2013) ID-score: A new empirical scoring function based on a comprehensive set of descriptors related to protein–ligand interactions. J Chem Inf Model 53:592–600
Pires DEV, Ascher DB (2016) CSM-lig: a web server for assessing and comparing protein–small molecule affinities. Nucleic Acids Res 44:gkw390
Zilian D, Sotriffer CA (2013) SFCscoreRF: a random forest-based scoring function for improved affinity prediction of protein–ligand complexes. J Chem Inf Model 53:1923–1933
Wang C, Zhang Y (2017) Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest. J Comput Chem 38:169–177
Wójcikowski M, Ballester PJ, Siedlecki P (2017) Performance of machine-learning scoring functions in structure-based virtual screening. Sci Rep 7:46710
Li J, Fu A, Zhang L (2019) An overview of scoring functions used for protein–ligand interactions in molecular docking. Interdiscip Sci Comput Life Sci 11:320–328
Wang J-C, Lin J-H (2013) Scoring functions for prediction of protein–ligand interactions. Curr Pharm Des 19:2174–2182
Cao Y, Li L (2014) Improved protein–ligand binding affinity prediction by using a curvature-dependent surface-area model. Bioinformatics 30:1674–1680
Ain QU, Aleksandrova A, Roessler FD, Ballester PJ (2015) Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip Rev Comput Mol Sci 5:405–424
Kramer C, Gedeck P (2010) Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets. J Chem Inf Model 50:1961–1969
Li Y, Yang J (2017) Structural and sequence similarity makes a significant impact on machine-learning-based scoring functions for protein–ligand interactions. J Chem Inf Model 57:1007–1012
Gohlke H, Kiel C, Case DA (2003) Insights into protein–protein binding by binding free energy calculation and free energy decomposition for the Ras-Raf and Ras–RalGDS complexes. J Mol Biol 330:891–913
Homeyer N, Gohlke H (2012) Free energy calculations by the molecular mechanics Poisson-Boltzmann surface area method. Mol Inform 31:114–122
Parenti MD, Rastelli G (2012) Advances and applications of binding affinity prediction methods in drug discovery. Biotechnol Adv 30:244–250
Kollman P (1993) Free-energy calculations—Applications to chemical and biochemical phenomena. Chem Rev 93:2395–2417
Ytreberg FM, Swendsen RH, Zuckerman DM (2006) Comparison of free energy methods for molecular systems. J Chem Phys 125:184114
Aqvist J, Luzhkov VB, Brandsdal BO (2002) Ligand binding affinities from MD simulations. Acc Chem Res 35:358–365
Wang E, Sun H, Wang J et al (2019) End-point binding free energy calculation with MM/PBSA and MM/GBSA: strategies and applications in drug design. Chem Rev 119:9478–9508
Cheng T, Li X, Li Y et al (2009) Comparative assessment of Sscoring Functions on a diverse test set. J Chem Inf Model 49:1079–1093
Hartshorn MJ, Verdonk ML, Chessari G et al (2007) Diverse, high-quality test set for the validation of protein–ligand docking performance. J Med Chem 50:726–741
Jain T, Jayaram B (2007) Computational protocol for predicting the binding affinities of zinc containing metalloprotein–ligand complexes. Proteins Struct Funct Bioinforma 67:1167–1178
Breiman L (2001) Random forests. Mach Learn 45:5–32
Lu C-H, Lin Y-F, Lin J-J, Yu C-S (2012) Prediction of metal ion–binding sites in proteins using the fragment transformation method. PLoS ONE 7:e39252
Liu Z, Li Y, Han L et al (2014) PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31:405–412
Wang R, Fang X, Lu Y et al (2005) The PDBbind database: methodologies and updates. J Med Chem 48:4111–4119
Berman HM (2000) The protein data bank. Nucleic Acids Res 28:235–242
Case DA, Cheatham TE, Darden T et al (2005) The Amber biomolecular simulation programs. J Comput Chem 26:1668–1688
Word JM, Lovell SC, Richardson JS, Richardson DC (1999) Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J Mol Biol 285:1735–1747
Jakalian A, Jack DB, Bayly CI (2002) Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II: Parameterization and validation. J Comput Chem 23:1623–1641
Lindorff-Larsen K, Piana S, Palmo K et al (2010) Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins 78:1950–1958
Wang J, Wolf RM, Caldwell JW et al (2004) Development and testing of a general amber force field. J Comput Chem 25:1157–1174
Mulliken RS (1955) Electronic population analysis on LCAO–MO molecular wave functions. I J Chem Phys 23:1833–1840
Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Scalmani G, Barone V, Mennucci B, Petersson GA, Nakatsuji H, Caricato M, Li X, Hratchian HP, Izmaylov AF, Bloino J, Zheng G, Sonnenber DJ (2009) Gaussian 09. Gaussian Inc, Wallingford, pp 2–3
Stote RH, Karplus M (1995) Zinc binding in proteins and solution: a simple but accurate nonbonded representation. Proteins Struct Funct Genet 23:12–31
Ȧqvist J (1990) Ion-water interaction potentials derived from free energy perturbation simulations. J Phys Chem 94:8021–8024
Aaqvist J, Warshel A (1990) Free energy relationships in metalloenzyme-catalyzed reactions: calculations of the effects of metal ion substitutions in staphylococcal nuclease. J Am Chem Soc 112:2860–2868
Shahrokh K, Orendt A, Yost GS, Cheatham TE (2012) Quantum mechanically derived AMBER-compatible heme parameters for various states of the cytochrome P450 catalytic cycle. J Comput Chem 33:119–133
Arora N, Jayaram B (1998) Energetics of base pairs in B-DNA in solution: an appraisal of potential functions and dielectric treatments. J Phys Chem B 102:6139–6144
Manning GS (1978) The molecular theory of polyelectrolyte solutions with applications to the electrostatic properties of polynucleotides. Q Rev Biophys 11:179–246
Cornell WD, Cieplak P, Bayly CI et al (1995) A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc 117:5179–5197
Wesson L, Eisenberg D (2008) Atomic solvation parameters applied to molecular dynamics of proteins in solution. Protein Sci 1:227–235
Eisenberg D, McLachlan AD (1986) Solvation energy in protein folding and binding. Nature 319:199–203
Lee B, Richards FM (1971) The interpretation of protein structures: estimation of static accessibility. J Mol Biol 55:379–400
Finkelstein AV, Janin J (1989) The price of lost freedom: entropy of bimolecular complex formation. Protein Eng Des Sel 3:1–3
Doig AJ, Sternberg MJE (1995) Side-chain conformational entropy in protein folding. Protein Sci 4:2247–2251
Pickett SD, Sternberg MJE (1993) Empirical scale of side-chain conformational entropy in protein folding. J Mol Biol 231:825–839
Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947–1958
Li H, Leung K-S, Wong M-H, Ballester PJ (2015) Improving autodock vina using random forest: the growing accuracy of binding affinity prediction by the effective exploitation of larger data sets. Mol Inform 34:115–126
Li H, Leung K-S, Wong M-H, Ballester PJ (2014) Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: cyscore as a case study. BMC Bioinformatics 15:291
Li H, Leung K-S, Wong M-H, Ballester PJ (2015) The use of random forest to predict binding affinity in docking. In: Ortuño F, Rojas I (eds) Bioinformatics and Biomedical Engineering: Third International Conference, IWBBIO 2015, Granada, Spain, April 15–17, 2015. Proceedings, Part II. Springer International Publishing, Cham, pp 238–247
Li Y, Han L, Liu Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 2: Evaluation methods and general results. J Chem Inf Model 54:1717–1736
Su M, Yang Q, Du Y et al (2019) Comparative assessment of scoring functions: the CASF-2016 update. J Chem Inf Model 59:895–913
Chen P, Ke Y, Lu Y et al (2019) DLIGAND2: an improved knowledge-based energy function for protein–ligand interactions using the distance-scaled, finite, ideal-gas reference state. J Cheminform 11:52
John Lu ZQ (2010) The elements of statistical learning: data mining, inference, and prediction. J R Stat Soc Ser A 173:693–694
Liu Q, Kwoh CK, Li J (2013) Binding affinity prediction for protein–ligand complexes based on β contacts and B factor. J Chem Inf Model 53:3076–3085
Ouyang X, Handoko SD, Kwoh CK (2011) Cscore: a simple yet effective scoring function for protein–ligand binding affinity prediction using modified CMAC learning architecture. J Bioinform Comput Biol 09:1–14
Kramer C, Gedeck P (2011) Global free energy scoring functions based on distance-dependent atom-type pair descriptors. J Chem Inf Model 51:707–720
Ballester PJ, Mitchell JBO (2011) Comments on “leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets”: significance for the validation of scoring functions. J Chem Inf Model 51:1739–1741
Acknowledgements
Authors gratefully acknowledge support to SCFBio from the Department of Biotechnology, Govt. of India. The authors thank Dr. Prashant S. Rana for sharing his insights into the random forest and Mr. Manpreet Singh for web-enabling Bappl+.
Author information
Authors and Affiliations
Contributions
AS, BJ conceived the project. AS performed all the calculations. RB helped in fine-tuning the work and in generating the web server. All authors analyzed the data and wrote the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Soni, A., Bhat, R. & Jayaram, B. Improving the binding affinity estimations of protein–ligand complexes using machine-learning facilitated force field method. J Comput Aided Mol Des 34, 817–830 (2020). https://doi.org/10.1007/s10822-020-00305-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-020-00305-1