A scalable and accurate method for classifying protein–ligand binding geometries using a MapReduce approach
Section snippets
Introduction and motivation
Cutting-edge distributed technologies, such as cloud and volunteer computing, provide scientists with an efficient and scalable way to perform computationally expensive simulations at a rate never seen before. However, this new capability to perform longer simulations presents new challenges for scientists who have to deal with the analysis, sorting, and selection of scientifically meaningful results from the massive amounts of data collected. Clustering techniques are an effective approach
Protein–ligand docking, cross docking, and receptor ensemble docking
The computational search for potential drug-like lead molecules in virtual screening relies on molecular protein–ligand docking to simulate the docking of small molecules (also called ligands) into proteins involved in the disease process. Protein–ligand docking is a search with uncertainties in a very large space of potential docking conformations; this space is shaped by the protein, the ligand, the computational methods, and the degrees of freedom to be explored [17]. Given a protein,
Methodology
We propose a novel method to identify 3D ligand conformations docked into one or multiple protein conformations and the method scalable implementation using MapReduce. The load balancing, fault tolerance, and scalability in MapReduce make the method attractive to exhaustively screen the large resulting space of ligand conformations which is difficult by traditional clustering methods.
Test set-up
We collect the dataset for testing our proposed method for classifying binding geometries by using the D@H project. On D@H, we ran docking trials for 23 protein–ligand complexes for HIV protease (an aspartic acid protease protein), 21 protein–ligand complexes for Trypsin (a serine protease protein), and 12 protein–ligand complexes for P38alpha kinase (a serine/threonine kinase protein). We also ran cross docking trials for 24 ligands, each docking into 24 protein conformations of the HIV
Related work
Exploring the search space of docking conformations has been approached using a variety of techniques including data analytics and clustering. Analytic approaches usually select one or multiple conformations that are likely to be near-native at runtime and then perform an extensive sampling around the predicted conformations. Important work in this direction includes Yang et al. [32] and Liang et al. [21]. These approaches improve the accuracy of docking methods and increase the probability of
Conclusions
In protein–ligand docking, accurately ranking a series of ligand conformations (scoring) is important to successfully predict whether a given ligand will bind to one protein more favorably than others. It is acknowledged that energy-based scoring methods are error-prone and that traditional clustering methods based on geometries are not scalable. Still, protein–ligand docking simulations are delivering increasingly larger datasets of ligand conformations, and accurate solutions that are also
Conflict of interest statement
Roger Armen's CoI:
PI's Collaborators and Co-Editors (Past 48 Months)
Collaborators: C.L. Brooks III (U Michigan), A. Mapp (U Michigan), M. Taufer (U Delaware), D.J. Doren (U Delaware), T.O. Chan (TJU), U. Rodeck (TJU), J.M. Pascal (TJU), J.Y. Cheung (Temple), A.M. Feldman (Temple), J.L. Benovic (TJU), C.P. Scott (TJU), R.A. Panettieri (U Penn), S.B. Liggett (U Maryland), R.B. Penn (U Maryland), B. Lu (TJU) A.P. Dicker (TJU) J.F. Zhang (TJU).
PI's Graduate Advisors
Valerie Daggett (University of
Acknowledgments
This work was supported by the NSF IIS #0968350 entitled Collaborative Research: SoCS - ExSciTecH: An Interactive, Easy-to-Use Volunteer Computing System to Explore Science, Technology, and Health and by the NSF OCI Cooperative Agreement #0910847 entitled Flash Gordon: A Data Intensive Computer. We used Trestles and Gordon-ION resources of Teragrid and XSEDE that are provided by SDSC.
The authors thank Joshua Bernstein (Penguin Computing Inc.) for his help in installing and setting Hadoop on our
References (36)
- et al.
Flexible ligand docking to multiple receptor conformations: a practical alternative
Curr. Opin. Struct. Biol.
(2008) - et al.
The statistical-thermodynamic basis for computation of binding affinities: a critical review
J. Biophys.
(1997) - et al.
A fast flexible docking method using an incremental construction algorithm
J. Mol. Biol.
(1996) - et al.
A new method for protein modeling and design: applications to docking and structure prediction from the distorted native conformation
J. Comput. Chem.
(1996) - D.P. Anderson, BOINC: a system for public-resource computing and storage, in: Proceedings of the Fifth IEEE/ACM...
- et al.
Automatic clustering of docking poses in virtual screening process using self-organising map
Bioinf. Adv. Access
(2009) - et al.
CHARMM: a program for macromolecular energy minimization, and dynamics calculations
J. Comput. Chem.
(1983) - et al.
Comparative study of several algorithms for flexible ligand docking
J. Comp. Aided Mol. Des.
(2003) - et al.
Empirical entropic contributions in computational docking: evaluation in APS reductase complexes
J. Comput. Chem.
(2008) - R.L.F. Cordeiro, C. Traina, Jr., A.J.M. Traina, J. López, U. Kang, C. Faloutsos, Clustering very large...
Assessing scoring functions for protein–ligand interactions
J. Med. Chem.
How to do an evaluation: pitfalls and traps
J. Comp. Aided Mol. Des.
Efficient calculation of configurational entropy from molecular simulations by combining the mutual-information expansion and nearest-neighbor methods
J. Comput. Chem.
Bias, reporting, and sharing: computational evaluations of docking methods
J. Comp. Aided Mol. Des.
Cited by (26)
The growing role of integrated and insightful big and real-time data analytics platforms
2020, Advances in ComputersBig data handling mechanisms in the healthcare applications: A comprehensive and systematic literature review
2018, Journal of Biomedical InformaticsCitation Excerpt :However, this method has a low level of training efficiency in certain size sample. A scalable and accurate method for classifying the protein-ligand binding geometries has been done by Estrada [54] in molecular docking. The first step of this method is to encode of the geometry of a three-dimensional ligand adaptation into a single three-dimensional point in the space.
An optimal big data workflow for biomedical image analysis
2018, Informatics in Medicine UnlockedCitation Excerpt :MapReduce programming is a special form of a directed acyclic graph (DAG) which is applicable to a wide range of used cases. MapReduce is organized in two functions [51,52]. The first one is a Map function, which transforms an element of data into some number of key/value pairs.
Enabling scalable and accurate clustering of distributed ligand geometries on supercomputers
2017, Parallel ComputingCitation Excerpt :Traditionally, docked conformations with minimum energy are assumed to be near-native. Research has shown, however, that this is not always the case [5]. Since selecting the near-native ligand geometry based on energy alone may result in incorrect conclusions, an alternative approach selects the near-native geometry from clustering.
The usage of internet of things in healthcare: A review of mechanisms, platforms, and opportunities from a new perspective
2023, Journal of Intelligent and Fuzzy SystemsMemory-Efficient and Skew-Tolerant MapReduce over MPI for Supercomputing Systems
2020, IEEE Transactions on Parallel and Distributed Systems
- 1
T. Estrada and B. Zhang have contributed equally to this work.