Multi-GPU-based detection of protein cavities using critical points
Introduction
Many biological processes in life sciences, in particular those involving drug interactions and protein docking, occur in water. The interaction between water and molecule can tell much information about the shape of a molecule, including the location of its binding sites. As Mezey noted in [1], this is of great importance to research in chemistry, biophysics, medicine, and nano-technology. A better interpretation and identification of such regions on a molecular surface can greatly help in discovering new drugs. Hence, the identification of those binding sites is often the first step in the study of protein functions, as in the structure-based drug design.
However, many small molecules (i.e., ligands) can bind to a given protein, depending on the number of binding sites on its molecular surface. It happens that, as noted by Henrich et al. [2], checking whether a certain molecule can bind to a particular protein takes a lot of time in lab. While, in general, binding sites correspond to concave, cleft or tunnel-shaped regions on a protein surface (cf. Kawabata and Go [3]), called pockets or cavities, not all cavities end up being binding sites for small ligands. Thus, detecting binding sites depends on efficient computational algorithms to locate all cavities on the molecular surface.
So, in this paper, we describe a method to identify the cavities on the protein surface as tentative binding sites for ligands. The novelty of the algorithm lies in directly evaluating the curvature of the scalar field (or function) that describes the molecular surface, instead of evaluating the curvature of the Connolly function [4] or the Mitchell–Kerr–Eyck function [5] over the molecular surface. This provides us with an advantage over state-of-the-art techniques. In fact, the technique is more robust in identifying candidate cavities because the curvature can be evaluated not only on the protein surface, but also at any point of the domain of the scalar field from eigenvalues of the Hessian matrix; hence, we are able to identify the critical points of the scalar field.
Indeed, CriticalFinder is the first surface-based method to succeed in finding a meaningful segmentation of a molecular surface into cavities and saliences. More specifically, our method relies on the theory of critical points (also called Morse theory) to identify cavities on the protein surface. While some research works have already tried to use curvature information (see, for example, Natarajan et al. [6]), the resulting segmentations did not prove effective for cavity detection purposes, because their charts (or segments) do not necessarily match protein cavities as tentative binding sites. Furthermore, to the best of our knowledge, CriticalFinder is the first cavity detection algorithm to take advantage of a loosely-coupled GPU cluster of computers equipped with Nvidia Tesla K40 graphics cards, over a local area network (LAN), to identify cavities on protein surfaces.
The remainder of our paper is organized as follows. Section 2 briefly surveys the most closely related work published in the literature. Section 3 describes the fundamentals of scalar field theory and theory of critical points underlying our algorithm. Section 4 describes our algorithm in detail, as well as its implementation. Section 5 briefly describes our technique to triangulate and visualize protein surfaces. Section 6 discusses the theoretical complexity of the algorithm. Section 7 describes the methodology followed in the optimization of the CUDA code. Section 8 contains the most relevant results produced by our method, including a comparison to other well-known algorithms found in the literature. Section 9 discusses the main conclusions, while providing relevant hints for future work.
Section snippets
Prior work
Intuitively, cavities (also called pockets) are concavities on protein surfaces, although their geometrical definition is not straightforward [3]. Indeed, cavities range from small spherical invaginations to deep curved or linear clefts in the protein [7]. Interestingly, researchers have observed that ligands (drugs, in particular) commonly bind into the largest and/or deepest concavity on the protein surface [8]. On average, such cavity might be three times as large as the ligand, which
Theory
This paper describes an algorithm for detecting cavities using scalar fields, and their critical points within an axis-aligned boxed domain that encloses a given molecule.
Cavity detection algorithm
The CriticalFinder algorithm adheres to the category of surface-based methods, though it takes advantage of the voxelization of the domain . This voxelization is also needed to triangulate and render the molecular surface through the marching cubes (MC) algorithm, which was originally introduced by Lorensen and Cline [37]. Nevertheless, we use the MC variant described in [38]. Both CriticalFinder and MC-based triangulation algorithms were designed and implemented to run on GPU via
Surface triangulation and rendering
In order to render the molecular surface on screen, we have developed a triangulation algorithm that entirely runs on GPU. In its essence, it is a variant of the marching cubes algorithm [37], which is here used to triangulate molecular surfaces. Its particularities stem from the fact that it is an atom-centric triangulation algorithm for molecular surfaces, i.e., the computation of the value of the scalar field, at any point of the domain, is done in an atom basis. The reader is referred to
Complexity analysis
The theoretical complexity of the CriticalFinder has mainly to do with the bounding box, the array of atoms, and how the four kernels use and access these data structures in memory. In general, the computations are performed per voxel, as it is the case of the first kernel (Algorithm 1), so that it takes time and space. For voxels, the complexity is thus in time and space. But, since each kernel is executed in parallel, one thread per voxel, the theoretical
CUDA code optimization
The leading idea of our hardware/software setup was to run the cavity detection code of CriticalFinder on Nvidia Tesla K40 graphics cards, while a Nvidia Quadro K5000 graphics card was only used for rendering and visualization of molecular surfaces and their cavities. Therefore, we needed only to take care of the code optimization on Tesla K40.
Hardware/software
In testing, we used a LAN (Local Area Network) of six GPU-enabled PCs under the control of Fedora 20 (64 bit version) Linux operating system, with each PC powered by an Intel Core i7-4820K processor, 3.70 GHz clock, and 32 GB RAM. The first PC incorporates a single Nvidia Quadro K5000 (with 4 GB memory) exclusively for visualization and graphics output, whereas each one of the other PCs was equipped with two Nvidia Tesla K40 cards exclusively for GPU computations, as necessary for
Concluding remarks
We have developed a novel surface-based algorithm to identify cavities over a molecular surface using the theory of critical points. At our best knowledge, this is the first surface-based algorithm that successfully detects and delineates cavities, as is usual in other categories of algorithms. It is true that other surface-based algorithms have tried to use the concept of curvature to detect cavities on the surface, but they have had little success in such challenge, because the resulting
Acknowledgments
The authors are very grateful to anonymous reviewers for their valuable suggestions, which contributed to significantly improve the paper. This research has been partially supported by the Portuguese Research Council (Fundação para a Ciência e Tecnologia), under the FCT Project UTAP-EXPL/QEQ-COM/0019/2014 and FCT Project UID/EEA/50008/2013. We gratefully acknowledge the support of NVIDIA Corporation for their donation of an Nvidia Quadro K5000 and Tesla K40 graphics cards.
Sérgio Dias obtained his B.Sc. (2008), M.Sc. (2010) and Ph.D. (2015) degrees in Computer Science and Engineering from University of Beira Interior, Portugal. He is currently a postdoc researcher at the Instituto de Telecomunicações, University of Beira Interior. His research interests include high-performance computing, with applications to computational biology, geometry, and visualization.
References (51)
- et al.
Rapid atomic density methods for molecular shape characterization
J. Mol. Graph. Model.
(2001) - et al.
Segmenting molecular surfaces
Comput. Aided Geom. Design
(2006) - et al.
Structure-based computational analysis of protein binding sites for function and druggability prediction
J. Biotechnol.
(2012) - et al.
Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery
Drug Discov. Today
(2010) Finding and filling protein cavities using cellular logic operations
J. Mol. Graph.
(1992)- et al.
Detection and geometric modeling of molecular surfaces and cavities using digital mathematical morphological operations
J. Mol. Graph.
(1995) - et al.
LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites
J. Mol. Graph. Model.
(2003) - et al.
A geometric approach to macromolecule-ligand interactions
J. Mol. Biol.
(1982) Surfnet: A program for visualizing molecular surfaces, cavities, and intermolecular interactions
J. Mol. Graph.
(1995)- et al.
On the definition and the construction of pockets in macromolecules
Discrete Appl. Math.
(1998)
Analytically defined surfaces to analyze molecular interaction properties
J. Mol. Graph.
Quality meshing of implicit solvation models of biomolecular structures
Comput. Aided Geom. Design
A continuation algorithm for planar implicit curves with singularities
Comput. Graph.
Triangulating molecular surfaces over a LAN of GPU-enabled computers
J. Parallel Comput.
LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins
J. Mol. Graph. Model.
Shape in Chemistry: An Introduction to Molecular Shape and Topology
Computational approaches to identifying and characterizing protein binding sites for ligand design
J. Mol. Recognit.
Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites
Proteins: Struct., Funct., Bioinformatics
Molecular shape analysis based upon the Morse-Smale complex and the Connolly function
Protein pockets: Inventory, shape, and comparison
J. Chem. Inf. Model.
On the nature of cavities on protein surfaces: Application to the identification of drug-binding sites
Proteins: Struct., Funct., Bioinformatics
Large-scale comparison of four binding site detection algorithms
J. Chem. Inf. Model.
Pocket-based drug design: Exploring pocket space
AAPS J.
Protein clefts in molecular recognition and function
Protein Sci.
Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites
Bioinformatics
Cited by (18)
Boosting analyses in the life sciences via clusters, grids and clouds
2017, Future Generation Computer SystemsCitation Excerpt :Consequently, the presented algorithm can enable the usage of exact algorithms for solving the Hitting Set problem and applying it to real world problems. The paper “Multi-GPU-Based Detection of Protein Cavities using Critical Points” [17], by Duarte et al., introduces a geometric method for detecting cavities on the molecular surface based on the theory of critical points. The method, called CriticalFinder, differs from other surface-based methods found in the literature because it directly takes advantage of the curvature of the scalar field (or function), which represents the molecular surface, instead of evaluating the curvature of the Connolly function over the molecular surface.
Applications of machine learning in computer-aided drug discovery
2022, QRB DiscoveryRefinePocket: An Attention-Enhanced and Mask-Guided Deep Learning Approach for Protein Binding Site Prediction
2023, IEEE/ACM Transactions on Computational Biology and BioinformaticsSiteFerret: Beyond Simple Pocket Identification in Proteins
2023, Journal of Chemical Theory and Computation
Sérgio Dias obtained his B.Sc. (2008), M.Sc. (2010) and Ph.D. (2015) degrees in Computer Science and Engineering from University of Beira Interior, Portugal. He is currently a postdoc researcher at the Instituto de Telecomunicações, University of Beira Interior. His research interests include high-performance computing, with applications to computational biology, geometry, and visualization.
Quoc Trong Nguyen obtained his B.Sc. (2014) degree in Computer Science from HoChiMinh City University of Pedagogy, Vietnam. He is currently a M.Sc. student and junior researcher at the Instituto de Telecomunicações, University of Beira Interior. His research interests include geometric computing, molecular graphics, and visualization.
Joaquim A. Jorge is a Full Professor in Computer Graphics at the Instituto Superior Técnico, University of Lisbon, Portugal. He received his Ph.D. from Rensselaer Polytechnic Institute in 1995 and coordinates the VIMMI research group at INESC-ID. He is Editor-in-Chief of the Computers and Graphics Journal (Elsevier), a Fellow of the Eurographics Association and Senior Member of ACM and IEEE, serves on the ACM Europe Council and Chairs the ACM/SIGGGRAPH Specialized Conferences Committee. His research interests include multimodal user interfaces, advanced 3D visualization and learning techniques.
Abel J.P. Gomes is an Associate Professor in Computer Graphics at the University of Beira Interior, Portugal. He obtained a Ph.D. degree in geometric modeling from Brunel University, England, in 2000. He has over 100 publications, including journal and conference articles, and 1 book published by Springer-Verlag. He was Head of the Department of Computer Science and Engineering, University of Beira Interior, Portugal, and the leader of a research unit of Instituto de Telecomunicações, which is one of the biggest research centers in Portugal. He is also a licensed Professional Engineer and member of the IEEE, ACM, and Eurographics. His current research interests include computer graphics algorithms, molecular graphics, geometric computing, and implicit curves and surfaces.