Abstract
Docking is a structure-based computational tool that can be used to predict the strength with which a small ligand molecule binds to a macromolecular target. Such binding affinity prediction is crucial to design molecules that bind more tightly to a target and thus are more likely to provide the most efficacious modulation of the target’s biochemical function. Despite intense research over the years, improving this type of predictive accuracy has proven to be a very challenging task for any class of method.
New scoring functions based on non-parametric machine-learning regression models, which are able to exploit effectively much larger volumes of experimental data and circumvent the need for a predetermined functional form, have become the most accurate to predict binding affinity of diverse protein-ligand complexes. In this focused review, we describe the inception and further development of RF-Score, the first machine-learning scoring function to achieve a substantial improvement over classical scoring functions at binding affinity prediction. RF-Score employs Random Forest (RF) regression to relate a structural description of the complex with its binding affinity. This overview will cover adequate benchmarking practices, studies exploring optimal intermolecular features, further improvements and RF-Score software availability including a user-friendly docking webserver and a standalone software for rescoring docked poses. Some work has also been made on the application of RF-Score to the related problem of virtual screening. Comprehensive retrospective virtual screening studies of RF-based scoring functions constitute now one of the next research steps.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ballester, P.J.: Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds.) PRIB 2012. LNCS, vol. 7632, pp. 14–25. Springer, Heidelberg (2012)
Ballester, P.J., Mangold, M., Howard, N.I., Robinson, R.L.M., Abell, C., Blumberger, J., Mitchell, J.B.O.: Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification. Journal of The Royal Society Interface 9(77), 3196–3207 (2012)
Ballester, P.J., Mitchell, J.B.O.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9), 1169–1175 (2010)
Ballester, P.J., Mitchell, J.B.O.: Comments on “Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets”: Significance for the Validation of Scoring Functions. Journal of Chemical Information and Modeling 51(8), 1739–1741 (2011)
Ballester, P.J., Schreyer, A., Blundell, T.L.: Does a More Precise Chemical Description of Protein-Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? Journal of Chemical Information and Modeling 54(3), 944–955 (2014)
Berman, H., Henrick, K., Nakamura, H.: Announcing the worldwide Protein Data Bank. Nature Structural & Molecular Biology 10(12), 980–980 (2003)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28(1), 235–242 (2000)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Cheng, T., Li, X., Li, Y., Liu, Z., Wang, R.: Comparative Assessment of Scoring Functions on a Diverse Test Set. Journal of Chemical Information and Modeling 49(4), 1079–1093 (2009)
Ding, B., Wang, J., Li, N., Wang, W.: Characterization of Small Molecule Binding. I. Accurate Identification of Strong Inhibitors in Virtual Screening. Journal of Chemical Information and Modeling 53(8), 114–122 (2013)
Li, H., Leung, K.S., Ballester, P.J., Wong, M.H.: istar: A Web Platform for Large-Scale Protein-Ligand Docking. PLoS ONE 9(1), e85678 (2014)
Li, H., Leung, K.S., Nakane, T., Wong, M.H.: iview: an interactive WebGL visualizer for protein-ligand complex. BMC Bioinformatics 15(1), 56 (2014)
Li, H., Leung, K.S., Wong, M.H.: idock: A multithreaded virtual screening tool for flexible ligand docking. In: Proceedings of the 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 77–84 (2012)
Li, H., Leung, K.S., Wong, M.H., Ballester, P.J.: Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study. BMC Bioinformatics 15(1), 291 (2014)
Li, H., Leung, K.S., Wong, M.H., Ballester, P.J.: Improving AutoDock Vina using Random Forest: the growing accuracy of binding affinity prediction by the effective exploitation of larger data sets. Molecular Informatics (2015), doi:10.1002/minf.201400132
Li, L., Wang, B., Meroueh, S.O.: Support Vector Regression Scoring of Receptor-Ligand Complexes for Rank-Ordering and Virtual Screening of Chemical Libraries. Journal of Chemical Information and Modeling 51(9), 2132–2138 (2011)
Li, Y., Liu, Z., Li, J., Han, L., Liu, J., Zhao, Z., Wang, R.: Comparative Assessment of Scoring Functions on an Updated Benchmark: 1. Compilation of the Test Set. Journal of Chemical Information and Modeling 54(6), 1700–1716 (2014)
Sato, T., Honma, T., Yokoyama, S.: Combining Machine Learning and Pharmacophore-Based Interaction Fingerprint for in Silico Screening. Journal of Chemical Information and Modeling 50(1), 170–185 (2010)
Trott, O., Olson, A.J.: AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of Computational Chemistry 31(2), 455–461 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, H., Leung, KS., Wong, MH., Ballester, P.J. (2015). The Use of Random Forest to Predict Binding Affinity in Docking. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9044. Springer, Cham. https://doi.org/10.1007/978-3-319-16480-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-16480-9_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16479-3
Online ISBN: 978-3-319-16480-9
eBook Packages: Computer ScienceComputer Science (R0)