Multi-GPU-based detection of protein cavities using critical points

doi:10.1016/j.future.2016.07.009

Future Generation Computer Systems

Volume 67, February 2017, Pages 430-440

https://doi.org/10.1016/j.future.2016.07.009 Get rights and content

Highlights

•
CriticalFinder is the first multi-GPU-based cavity detection algorithm.
•
CriticalFinder is the first surface-based cavity detection algorithm that produces a meaningful, coarse cavity segmentation.
•
CriticalFinder sustains on the theory of critical points (i.e., Morse theory).

Abstract

Protein cavities are specific regions on the protein surface where ligands (small molecules) may bind. Such cavities are putative binding sites of proteins for ligands. Usually, cavities correspond to voids, pockets, and depressions of molecular surfaces. The location of such cavities is important to better understand protein functions, as needed in, for example, structure-based drug design. This article introduces a geometric method to detecting cavities on the molecular surface based on the theory of critical points. The method, called CriticalFinder, differs from other surface-based methods found in the literature because it directly uses the curvature of the scalar field (or function) that represents the molecular surface, instead of evaluating the curvature of the Connolly function over the molecular surface. To evaluate the accuracy of CriticalFinder, we compare it to other seven geometric methods (i.e., ${LIGSITE}^{C S}$ , GHECOM, ConCavity, POCASA, SURFNET, PASS, and Fpocket). The benchmark results show that CriticalFinder outperforms those methods in terms of accuracy. In addition, the performance analysis of the GPU implementation of CriticalFinder in terms of time consumption and memory space occupancy was carried out.

Introduction

Many biological processes in life sciences, in particular those involving drug interactions and protein docking, occur in water. The interaction between water and molecule can tell much information about the shape of a molecule, including the location of its binding sites. As Mezey noted in [1], this is of great importance to research in chemistry, biophysics, medicine, and nano-technology. A better interpretation and identification of such regions on a molecular surface can greatly help in discovering new drugs. Hence, the identification of those binding sites is often the first step in the study of protein functions, as in the structure-based drug design.

However, many small molecules (i.e., ligands) can bind to a given protein, depending on the number of binding sites on its molecular surface. It happens that, as noted by Henrich et al. [2], checking whether a certain molecule can bind to a particular protein takes a lot of time in lab. While, in general, binding sites correspond to concave, cleft or tunnel-shaped regions on a protein surface (cf. Kawabata and Go [3]), called pockets or cavities, not all cavities end up being binding sites for small ligands. Thus, detecting binding sites depends on efficient computational algorithms to locate all cavities on the molecular surface.

So, in this paper, we describe a method to identify the cavities on the protein surface as tentative binding sites for ligands. The novelty of the algorithm lies in directly evaluating the curvature of the scalar field (or function) that describes the molecular surface, instead of evaluating the curvature of the Connolly function [4] or the Mitchell–Kerr–Eyck function [5] over the molecular surface. This provides us with an advantage over state-of-the-art techniques. In fact, the technique is more robust in identifying candidate cavities because the curvature can be evaluated not only on the protein surface, but also at any point of the domain of the scalar field from eigenvalues of the Hessian matrix; hence, we are able to identify the critical points of the scalar field.

Indeed, CriticalFinder is the first surface-based method to succeed in finding a meaningful segmentation of a molecular surface into cavities and saliences. More specifically, our method relies on the theory of critical points (also called Morse theory) to identify cavities on the protein surface. While some research works have already tried to use curvature information (see, for example, Natarajan et al. [6]), the resulting segmentations did not prove effective for cavity detection purposes, because their charts (or segments) do not necessarily match protein cavities as tentative binding sites. Furthermore, to the best of our knowledge, CriticalFinder is the first cavity detection algorithm to take advantage of a loosely-coupled GPU cluster of computers equipped with Nvidia Tesla K40 graphics cards, over a local area network (LAN), to identify cavities on protein surfaces.

The remainder of our paper is organized as follows. Section 2 briefly surveys the most closely related work published in the literature. Section 3 describes the fundamentals of scalar field theory and theory of critical points underlying our algorithm. Section 4 describes our algorithm in detail, as well as its implementation. Section 5 briefly describes our technique to triangulate and visualize protein surfaces. Section 6 discusses the theoretical complexity of the algorithm. Section 7 describes the methodology followed in the optimization of the CUDA code. Section 8 contains the most relevant results produced by our method, including a comparison to other well-known algorithms found in the literature. Section 9 discusses the main conclusions, while providing relevant hints for future work.

Section snippets

Prior work

Intuitively, cavities (also called pockets) are concavities on protein surfaces, although their geometrical definition is not straightforward [3]. Indeed, cavities range from small spherical invaginations to deep curved or linear clefts in the protein [7]. Interestingly, researchers have observed that ligands (drugs, in particular) commonly bind into the largest and/or deepest concavity on the protein surface [8]. On average, such cavity might be three times as large as the ligand, which

Theory

This paper describes an algorithm for detecting cavities using scalar fields, and their critical points within an axis-aligned boxed domain $D \subset R^{3}$ that encloses a given molecule.

Cavity detection algorithm

The CriticalFinder algorithm adheres to the category of surface-based methods, though it takes advantage of the voxelization of the domain $D \in R^{3}$ . This voxelization is also needed to triangulate and render the molecular surface through the marching cubes (MC) algorithm, which was originally introduced by Lorensen and Cline [37]. Nevertheless, we use the MC variant described in [38]. Both CriticalFinder and MC-based triangulation algorithms were designed and implemented to run on GPU via

Surface triangulation and rendering

In order to render the molecular surface on screen, we have developed a triangulation algorithm that entirely runs on GPU. In its essence, it is a variant of the marching cubes algorithm [37], which is here used to triangulate molecular surfaces. Its particularities stem from the fact that it is an atom-centric triangulation algorithm for molecular surfaces, i.e., the computation of the value of the scalar field, at any point of the domain, is done in an atom basis. The reader is referred to

Complexity analysis

The theoretical complexity of the CriticalFinder has mainly to do with the $I \times J \times K$ bounding box, the array $A$ of $n$ atoms, and how the four kernels use and access these data structures in memory. In general, the computations are performed per voxel, as it is the case of the first kernel (Algorithm 1), so that it takes $O (1)$ time and space. For $I \times J \times K$ voxels, the complexity is thus $O (I \times J \times K)$ in time and space. But, since each kernel is executed in parallel, one thread per voxel, the theoretical

CUDA code optimization

The leading idea of our hardware/software setup was to run the cavity detection code of CriticalFinder on Nvidia Tesla K40 graphics cards, while a Nvidia Quadro K5000 graphics card was only used for rendering and visualization of molecular surfaces and their cavities. Therefore, we needed only to take care of the code optimization on Tesla K40.

Hardware/software

In testing, we used a LAN (Local Area Network) of six GPU-enabled PCs under the control of Fedora 20 (64 bit version) Linux operating system, with each PC powered by an Intel Core i7-4820K processor, 3.70 GHz clock, and 32 GB RAM. The first PC incorporates a single Nvidia Quadro K5000 (with 4 GB memory) exclusively for visualization and graphics output, whereas each one of the other PCs was equipped with two Nvidia Tesla K40 cards exclusively for GPU computations, as necessary for

Concluding remarks

We have developed a novel surface-based algorithm to identify cavities over a molecular surface using the theory of critical points. At our best knowledge, this is the first surface-based algorithm that successfully detects and delineates cavities, as is usual in other categories of algorithms. It is true that other surface-based algorithms have tried to use the concept of curvature to detect cavities on the surface, but they have had little success in such challenge, because the resulting

Acknowledgments

The authors are very grateful to anonymous reviewers for their valuable suggestions, which contributed to significantly improve the paper. This research has been partially supported by the Portuguese Research Council (Fundação para a Ciência e Tecnologia), under the FCT Project UTAP-EXPL/QEQ-COM/0019/2014 and FCT Project UID/EEA/50008/2013. We gratefully acknowledge the support of NVIDIA Corporation for their donation of an Nvidia Quadro K5000 and Tesla K40 graphics cards.

Sérgio Dias obtained his B.Sc. (2008), M.Sc. (2010) and Ph.D. (2015) degrees in Computer Science and Engineering from University of Beira Interior, Portugal. He is currently a postdoc researcher at the Instituto de Telecomunicações, University of Beira Interior. His research interests include high-performance computing, with applications to computational biology, geometry, and visualization.

References (51)

J.C. Mitchell et al.
Rapid atomic density methods for molecular shape characterization
J. Mol. Graph. Model.
(2001)
V. Natarajan et al.
Segmenting molecular surfaces
Comput. Aided Geom. Design
(2006)
B. Nisius et al.
Structure-based computational analysis of protein binding sites for function and druggability prediction
J. Biotechnol.
(2012)
S. Pérot et al.
Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery
Drug Discov. Today
(2010)
J.S. Delaney
Finding and filling protein cavities using cellular logic operations
J. Mol. Graph.
(1992)
M. Masuya et al.
Detection and geometric modeling of molecular surfaces and cavities using digital mathematical morphological operations
J. Mol. Graph.
(1995)
C. Venkatachalam et al.
LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites
J. Mol. Graph. Model.
(2003)
I.D. Kuntz et al.
A geometric approach to macromolecule-ligand interactions
J. Mol. Biol.
(1982)
R.A. Laskowski
Surfnet: A program for visualizing molecular surfaces, cavities, and intermolecular interactions
J. Mol. Graph.
(1995)
H. Edelsbrunner et al.
On the definition and the construction of pockets in macromolecules
Discrete Appl. Math.
(1998)

R. Gabdoulline et al.

Analytically defined surfaces to analyze molecular interaction properties

J. Mol. Graph.

(1996)

Y. Zhang et al.

Quality meshing of implicit solvation models of biomolecular structures

Comput. Aided Geom. Design

(2006)

A.J. Gomes

A continuation algorithm for planar implicit curves with singularities

Comput. Graph.

(2014)

S. Dias et al.

Triangulating molecular surfaces over a LAN of GPU-enabled computers

J. Parallel Comput.

(2015)

M. Hendlich et al.

LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins

J. Mol. Graph. Model.

(1997)

P.G. Mezey

Shape in Chemistry: An Introduction to Molecular Shape and Topology

(1993)

S. Henrich et al.

Computational approaches to identifying and characterizing protein binding sites for ligand design

J. Mol. Recognit.

(2010)

T. Kawabata et al.

Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites

Proteins: Struct., Funct., Bioinformatics

(2007)

F. Cazals et al.

Molecular shape analysis based upon the Morse-Smale complex and the Connolly function

R. Coleman et al.

Protein pockets: Inventory, shape, and comparison

J. Chem. Inf. Model.

(2010)

M. Nayal et al.

On the nature of cavities on protein surfaces: Application to the identification of drug-binding sites

Proteins: Struct., Funct., Bioinformatics

(2006)

P. Schmidtke et al.

Large-scale comparison of four binding site detection algorithms

J. Chem. Inf. Model.

(2010)

X. Zheng et al.

Pocket-based drug design: Exploring pocket space

AAPS J.

(2013)

R.A. Laskowski et al.

Protein clefts in molecular recognition and function

Protein Sci.

(1996)

A.T.R. Laurie et al.

Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites

Bioinformatics

(2005)

Cited by (18)

Boosting analyses in the life sciences via clusters, grids and clouds
2017, Future Generation Computer Systems
Citation Excerpt :
Consequently, the presented algorithm can enable the usage of exact algorithms for solving the Hitting Set problem and applying it to real world problems. The paper “Multi-GPU-Based Detection of Protein Cavities using Critical Points” [17], by Duarte et al., introduces a geometric method for detecting cavities on the molecular surface based on the theory of critical points. The method, called CriticalFinder, differs from other surface-based methods found in the literature because it directly takes advantage of the curvature of the scalar field (or function), which represents the molecular surface, instead of evaluating the curvature of the Connolly function over the molecular surface.
In the last 20 years, computational methods have become an important part of developing emerging technologies for the field of bioinformatics and biomedicine. Those methods rely heavily on large scale computational resources as they need to manage Tbytes or Pbytes of data with large-scale structural and functional relationships, TFlops or PFlops of computing power for simulating highly complex models, or many-task processes and workflows for processing and analyzing data. This special issue contains papers showing existing solutions and latest developments in Life Sciences and Computing Sciences to collaboratively explore new ideas and approaches to successfully apply distributed IT-systems in translational research, clinical intervention, and decision-making.
Applications of machine learning in computer-aided drug discovery
2022, QRB Discovery
Multi-scale Iterative Refinement towards Robust and Versatile Molecular Docking
2023, arXiv
RefinePocket: An Attention-Enhanced and Mask-Guided Deep Learning Approach for Protein Binding Site Prediction
2023, IEEE/ACM Transactions on Computational Biology and Bioinformatics
SiteFerret: Beyond Simple Pocket Identification in Proteins
2023, Journal of Chemical Theory and Computation
EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction
2023, arXiv

View all citing articles on Scopus

Quoc Trong Nguyen obtained his B.Sc. (2014) degree in Computer Science from HoChiMinh City University of Pedagogy, Vietnam. He is currently a M.Sc. student and junior researcher at the Instituto de Telecomunicações, University of Beira Interior. His research interests include geometric computing, molecular graphics, and visualization.

Joaquim A. Jorge is a Full Professor in Computer Graphics at the Instituto Superior Técnico, University of Lisbon, Portugal. He received his Ph.D. from Rensselaer Polytechnic Institute in 1995 and coordinates the VIMMI research group at INESC-ID. He is Editor-in-Chief of the Computers and Graphics Journal (Elsevier), a Fellow of the Eurographics Association and Senior Member of ACM and IEEE, serves on the ACM Europe Council and Chairs the ACM/SIGGGRAPH Specialized Conferences Committee. His research interests include multimodal user interfaces, advanced 3D visualization and learning techniques.

Abel J.P. Gomes is an Associate Professor in Computer Graphics at the University of Beira Interior, Portugal. He obtained a Ph.D. degree in geometric modeling from Brunel University, England, in 2000. He has over 100 publications, including journal and conference articles, and 1 book published by Springer-Verlag. He was Head of the Department of Computer Science and Engineering, University of Beira Interior, Portugal, and the leader of a research unit of Instituto de Telecomunicações, which is one of the biggest research centers in Portugal. He is also a licensed Professional Engineer and member of the IEEE, ACM, and Eurographics. His current research interests include computer graphics algorithms, molecular graphics, geometric computing, and implicit curves and surfaces.

View full text

Multi-GPU-based detection of protein cavities using critical points

Highlights

Abstract

Introduction

Section snippets

Prior work

Theory

Cavity detection algorithm

Surface triangulation and rendering

Complexity analysis

CUDA code optimization

Hardware/software

Concluding remarks

Acknowledgments

J. Mol. Graph. Model.

Comput. Aided Geom. Design

J. Biotechnol.

Drug Discov. Today

J. Mol. Graph.

J. Mol. Graph.

J. Mol. Graph. Model.

J. Mol. Biol.

J. Mol. Graph.

Discrete Appl. Math.

J. Mol. Graph.

Comput. Aided Geom. Design

Comput. Graph.

J. Parallel Comput.

J. Mol. Graph. Model.

Shape in Chemistry: An Introduction to Molecular Shape and Topology

Computational approaches to identifying and characterizing protein binding sites for ligand design

J. Mol. Recognit.

Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites

Proteins: Struct., Funct., Bioinformatics

Molecular shape analysis based upon the Morse-Smale complex and the Connolly function

Protein pockets: Inventory, shape, and comparison

J. Chem. Inf. Model.

On the nature of cavities on protein surfaces: Application to the identification of drug-binding sites

Proteins: Struct., Funct., Bioinformatics

Large-scale comparison of four binding site detection algorithms

J. Chem. Inf. Model.

Pocket-based drug design: Exploring pocket space

AAPS J.

Protein clefts in molecular recognition and function

Protein Sci.

Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites

Bioinformatics