Abstract
The accurate description of protein binding sites is essential to the determination of similarity and the application of machine learning methods to relate the binding sites to observed functions. This work describes CAVIAR, a new open source tool for generating descriptors for binding sites, using protein structures in PDB and mmCIF format as well as trajectory frames from molecular dynamics simulations as input. The applicability of CAVIAR descriptors is showcased by computing machine learning predictions of binding site ligandability. The method can also automatically assign subcavities, even in the absence of a bound ligand. The defined subpockets mimic the empirical definitions used in medicinal chemistry projects. It is shown that the experimental binding affinity scales relatively well with the number of subcavities filled by the ligand, with compounds binding to more than three subcavities having nanomolar or better affinities to the target. The CAVIAR descriptors and methods can be used in any machine learning-based investigations of problems involving binding sites, from protein engineering to hit identification. The full software code is available on GitHub and a conda package is hosted on Anaconda cloud.
Similar content being viewed by others
Data availability
All data are publicly available on GitHub.
Code availability
Installation notes, user manual and support for CAVIAR are available at https://jr-marchand.github.io/caviar/. The GitHub repository hosts the CAVIAR source code and validation sets at https://github.com/jr-marchand/caviar. A conda package is hosted on Anaconda cloud at https://anaconda.org/jr-marchand/caviar. Source code and data available under a MIT license.
References
Westbrook JD, Burley SK (2019) How structural biologists and the protein data Bank contributed to recent FDA new drug approvals. Structure 27:211–217. https://doi.org/10.1016/j.str.2018.11.007
Simões T, Lopes D, Dias S et al (2017) Geometric detection algorithms for cavities on protein surfaces in molecular graphics: a survey. Comput Gr Forum 36:643–683. https://doi.org/10.1111/cgf.13158
Volkamer A, von Behren MM, Bietz S, Rarey M (2018) Prediction, analysis, and comparison of active sites. In: Engel T, Gasteiger J (eds) Applied chemoinformatics. Wiley, New York, pp 283–311
Macari G, Toti D, Polticelli F (2019) Computational methods and tools for binding site recognition between proteins and small molecules: from classical geometrical approaches to modern machine learning strategies. J Comput Aided Mol Des 33:887–903. https://doi.org/10.1007/s10822-019-00235-7
Volkamer A, Kuhn D, Rippmann F, Rarey M (2012) DoGSiteScorer: a web server for automatic binding site prediction, analysis and druggability assessment. Bioinformatics 28:2074–2075. https://doi.org/10.1093/bioinformatics/bts310
Le Guilloux V, Schmidtke P, Tuffery P (2009) Fpocket: an open source platform for ligand pocket detection. BMC Bioinform 10:168. https://doi.org/10.1186/1471-2105-10-168
Halgren TA (2009) Identifying and characterizing binding sites and assessing druggability. J Chem Inf Model 49:377–389. https://doi.org/10.1021/ci800324m
Nayal M, Honig B (2006) On the nature of cavities on protein surfaces: application to the identification of drug-binding sites. Proteins Struct Funct Bioinform 63:892–906. https://doi.org/10.1002/prot.20897
Desaphy J, Azdimousa K, Kellenberger E, Rognan D (2012) Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes. J Chem Inf Model 52:2287–2299. https://doi.org/10.1021/ci300184x
Ehrt C, Brinkjost T, Koch O (2016) Impact of binding site comparisons on medicinal chemistry and rational molecular design. J Med Chem 59:4121–4151. https://doi.org/10.1021/acs.jmedchem.6b00078
Xie L, Evangelidis T, Xie L, Bourne PE (2011) Drug discovery using chemical systems biology: weak inhibition of multiple kinases may contribute to the anti-cancer effect of nelfinavir. PLoS Comput Biol 7:e1002037. https://doi.org/10.1371/journal.pcbi.1002037
Möller-Acuña P, Contreras-Riquelme JS, Rojas-Fuentes C et al (2015) Similarities between the binding sites of SB-206553 at serotonin type 2 and Alpha7 acetylcholine nicotinic receptors: rationale for its polypharmacological profile. PLoS ONE 10:e0134444. https://doi.org/10.1371/journal.pone.0134444
Schumann M, Armen RS (2013) Identification of distant drug off-targets by direct superposition of binding pocket surfaces. PLoS ONE 8:e83533. https://doi.org/10.1371/journal.pone.0083533
Schirris TJJ, Ritschel T, Herma Renkema G et al (2015) Mitochondrial ADP/ATP exchange inhibition: a novel off-target mechanism underlying ibipinabant-induced myotoxicity. Sci Rep 5:1–12. https://doi.org/10.1038/srep14533
Kuhn D, Weskamp N, Schmitt S et al (2006) From the similarity analysis of protein cavities to the functional classification of protein families using cavbase. J Mol Biol 359:1023–1044. https://doi.org/10.1016/j.jmb.2006.04.024
Kinoshita K, Furui J, Nakamura H (2002) Identification of protein functions from a molecular surface database, eF-site. J Struct Funct Genomics 2:9–22. https://doi.org/10.1023/A:1011318527094
Konc J, Hodošček M, Ogrizek M et al (2013) Structure-based function prediction of uncharacterized protein using binding sites comparison. PLoS Comput Biol 9:e1003341. https://doi.org/10.1371/journal.pcbi.1003341
Anand P, Sankaran S, Mukherjee S et al (2011) Structural annotation of mycobacterium tuberculosis proteome. PLoS ONE 6:e27044. https://doi.org/10.1371/journal.pone.0027044
Al-Gharabli SI, Shah STA, Weik S et al (2006) An efficient method for the synthesis of peptide aldehyde libraries employed in the discovery of reversible SARS Coronavirus Main Protease (SARS-CoV Mpro) Inhibitors. ChemBioChem 7:1048–1055. https://doi.org/10.1002/cbic.200500533
Willmann D, Lim S, Wetzel S et al (2012) Impairment of prostate cancer cell growth by a selective and reversible lysine-specific demethylase 1 inhibitor. Int J Cancer 131:2704–2709. https://doi.org/10.1002/ijc.27555
Kooistra AJ, Leurs R, de Esch IJP, de Graaf C (2015) Structure-based prediction of G-protein-coupled receptor ligand function: a β-adrenoceptor Case Study. J Chem Inf Model 55:1045–1061. https://doi.org/10.1021/acs.jcim.5b00066
Weber A, Casini A, Heine A et al (2004) Unexpected nanomolar inhibition of carbonic anhydrase by COX-2-selective celecoxib: new pharmacological opportunities due to related binding site recognition. J Med Chem 47:550–557. https://doi.org/10.1021/jm030912m
Weisel M, Proschak E, Schneider G (2007) PocketPicker: analysis of ligand binding-sites with shape descriptors. Chem Cent J 1:7. https://doi.org/10.1186/1752-153X-1-7
Volkamer A, Griewel A, Grombacher T, Rarey M (2010) Analyzing the topology of active sites: on the prediction of pockets and subpockets. J Chem Inf Model 50:2041–2052. https://doi.org/10.1021/ci100241y
Goodford PJ (1985) A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J Med Chem 28:849–857. https://doi.org/10.1021/jm00145a002
Bliznyuk AA, Gready JE (1998) Identification and energetic ranking of possible docking sites for pterin on dihydrofolate reductase. J Comput Aided Mol Des 12:325–333. https://doi.org/10.1023/A:1008039000355
Ngan CH, Bohnuud T, Mottarella SE et al (2012) FTMAP: extended protein mapping with user-selected probe molecules. Nucleic Acids Res 40:W271–W275. https://doi.org/10.1093/nar/gks441
Laurie ATR, Jackson RM (2005) Q-SiteFinder: an energy-based method for the prediction of protein–ligand binding sites. Bioinformatics 21:1908–1916. https://doi.org/10.1093/bioinformatics/bti315
Marchand J-R, Caflisch A (2018) In silico fragment-based drug design with SEED. Eur J Med Chem 156:907–917. https://doi.org/10.1016/j.ejmech.2018.07.042
Miranker A, Karplus M (1991) Functionality maps of binding sites: a multiple copy simultaneous search method. Proteins Struct Funct Bioinform 11:29–34. https://doi.org/10.1002/prot.340110104
Simões T, Lopes D, Dias S et al (2017) Geometric detection algorithms for cavities on protein surfaces in molecular graphics: a survey. Comput Gr Forum J Eur Assoc Comput Gr 36:643–683. https://doi.org/10.1111/cgf.13158
Xie Z-R, Hwang M-J (2015) Methods for predicting protein-ligand binding sites. In: Kukol A (ed) Molecular modeling of proteins. Springer, New York, NY, pp 383–398
Huang B, Schroeder M (2006) LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 6:19. https://doi.org/10.1186/1472-6807-6-19
Capra JA, Laskowski RA, Thornton JM et al (2009) Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 5:e1000585. https://doi.org/10.1371/journal.pcbi.1000585
Huang B (2009) MetaPocket: a meta approach to improve protein ligand binding site prediction. OMICS J Integr Biol 13:325–330. https://doi.org/10.1089/omi.2009.0045
Zhang Z, Li Y, Lin B et al (2011) Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction. Bioinformatics 27:2083–2088. https://doi.org/10.1093/bioinformatics/btr331
Levitt DG, Banaszak LJ (1992) POCKET: a computer graphies method for identifying and displaying protein cavities and their surrounding amino acids. J Mol Gr 10:229–234. https://doi.org/10.1016/0263-7855(92)80074-N
Hendlich M, Rippmann F, Barnickel G (1997) LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Gr Model 15:359–363. https://doi.org/10.1016/S1093-3263(98)00002-3
Kalidas Y, Chandra N (2008) PocketDepth: a new depth based algorithm for identification of ligand binding sites in proteins. J Struct Biol 161:31–42. https://doi.org/10.1016/j.jsb.2007.09.005
Till MS, Ullmann GM (2010) McVol—a program for calculating protein volumes and identifying cavities by a Monte Carlo algorithm. J Mol Model 16:419–429. https://doi.org/10.1007/s00894-009-0541-y
Tripathi A, Kellogg GE (2010) A novel and efficient tool for locating and characterizing protein cavities and binding sites. Proteins Struct Funct Bioinform 78:825–842. https://doi.org/10.1002/prot.22608
Laskowski RA (1995) SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Gr 13:323–330. https://doi.org/10.1016/0263-7855(95)00073-9
Brady GP, Stouten PFW (2000) Fast prediction and visualization of protein binding pockets with PASS. J Comput Aided Mol Des 14:383–401. https://doi.org/10.1023/A:1008124202956
Kawabata T, Go N (2007) Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites. Proteins 68:516–529. https://doi.org/10.1002/prot.21283
Oliveira SH, Ferraz FA, Honorato RV et al (2014) KVFinder: steered identification of protein cavities as a PyMOL plugin. BMC Bioinform 15:197. https://doi.org/10.1186/1471-2105-15-197
Kawabata T (2010) Detection of multiscale pockets on protein surfaces using mathematical morphology. Proteins Struct Funct Bioinform 78:1195–1211. https://doi.org/10.1002/prot.22639
Yu J, Zhou Y, Tanaka I, Yao M (2010) Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere. Bioinformatics 26:46–52. https://doi.org/10.1093/bioinformatics/btp599
Lewis RA (1989) Determination of clefts in receptor structures. J Comput Aided Mol Des 3:133–147. https://doi.org/10.1007/BF01557724
Peters KP, Fauck J, Frömmel C (1996) The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria. J Mol Biol 256:201–213. https://doi.org/10.1006/jmbi.1996.0077
Liang J, Edelsbrunner H, Woodward C (1998) Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Sci Publ Protein Soc 7:1884–1897
Simões TMC, Gomes AJP (2019) CavVis—a field-of-view geometric algorithm for protein cavity detection. J Chem Inf Model 59:786–796. https://doi.org/10.1021/acs.jcim.8b00572
Hajduk PJ, Meadows RP, Fesik SW (1997) Discovering high-affinity ligands for proteins. Science 278:497–499. https://doi.org/10.1126/science.278.5337.497
Bartolowits M, Davisson VJ (2016) Considerations of protein subpockets in fragment-based drug design. Chem Biol Drug Des 87:5–20. https://doi.org/10.1111/cbdd.12631
Erlanson DA, Fesik SW, Hubbard RE et al (2016) Twenty years on: the impact of fragments on drug discovery. Nat Rev Drug Discov 15:605–619. https://doi.org/10.1038/nrd.2016.109
Marchand J-R, Dalle Vedove A, Lolli G, Caflisch A (2017) Discovery of inhibitors of four bromodomains by fragment-anchored ligand docking. J Chem Inf Model 57:2584–2597. https://doi.org/10.1021/acs.jcim.7b00336
Wirth M, Volkamer A, Zoete V et al (2013) Protein pocket and ligand shape comparison and its application in virtual screening. J Comput Aided Mol Des 27:511–524. https://doi.org/10.1007/s10822-013-9659-1
Kahraman A, Morris RJ, Laskowski RA, Thornton JM (2007) Shape variation in protein binding pockets and their ligands. J Mol Biol 368:283–301. https://doi.org/10.1016/j.jmb.2007.01.086
Chan AWE, Laskowski RA, Selwood DL (2010) Chemical fragments that hydrogen bond to Asp, Glu, Arg, and his side chains in protein binding sites. J Med Chem 53:3086–3094. https://doi.org/10.1021/jm901696w
Wang L, Xie Z, Wipf P, Xie X-Q (2011) Residue preference mapping of ligand fragments in the protein Data Bank. J Chem Inf Model 51:807–815. https://doi.org/10.1021/ci100386y
Durrant JD, Friedman AJ, McCammon JA (2011) CrystalDock: a novel approach to fragment-based drug design. J Chem Inf Model 51:2573–2580. https://doi.org/10.1021/ci200357y
Tang GW, Altman RB (2014) Knowledge-based fragment binding prediction. PLoS Comput Biol 10:e1003589. https://doi.org/10.1371/journal.pcbi.1003589
Kalliokoski T, Olsson TSG, Vulpetti A (2013) Subpocket analysis method for fragment-based drug discovery. J Chem Inf Model 53:131–141. https://doi.org/10.1021/ci300523r
Wood DJ, de Vlieg J, Wagener M, Ritschel T (2012) Pharmacophore fingerprint-based approach to binding site subpocket similarity and its application to bioisostere replacement. J Chem Inf Model 52:2031–2043. https://doi.org/10.1021/ci3000776
Volkamer A, Grombacher T, Rarey M (2010) Where are the boundaries? Automated pocket detection for druggability studies. J Cheminform 2:P11. https://doi.org/10.1186/1758-2946-2-S1-P11
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Abadi M, Agarwal A, Barham P, et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. In: Proceedings of the 12th USENIX symposium on operating systems design and implementation (OSDI ’16). November 2–4, 2016, Savannah, GA, USA. ISBN 978-1-931971-33-1. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
Paszke A, Gross S, Massa F et al (2019) PyTorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037
Mysinger MM, Carchia M, JohnJ I, Shoichet BK (2012) Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem 55:6582–6594. https://doi.org/10.1021/jm300687e
Desaphy J, Bret G, Rognan D, Kellenberger E (2015) sc-PDB: a 3D-database of ligandable binding sites—10 years on. Nucleic Acids Res 43:D399–D404. https://doi.org/10.1093/nar/gku928
Liu Z, Li Y, Han L et al (2015) PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31:405–412. https://doi.org/10.1093/bioinformatics/btu626
Beucher S (1994) Watershed, hierarchical segmentation and waterfall algorithm. In: Serra J, Soille P (eds) Mathematical morphology and its applications to image processing. Springer, Dordrecht, pp 69–76
Pirard B, Ertl P (2015) Evaluation of a semi-automated workflow for fragment growing. J Chem Inf Model 55:180–193. https://doi.org/10.1021/ci5006355
Huth JR, Park C, Petros AM et al (2007) Discovery and design of novel HSP90 inhibitors using multiple fragment-based design strategies. Chem Biol Drug Des 70:1–12. https://doi.org/10.1111/j.1747-0285.2007.00535.x
Ghosh AK, Osswald HL, Prato G (2016) Recent progress in the development of HIV-1 protease inhibitors for the treatment of HIV/AIDS. J Med Chem 59:5172–5208. https://doi.org/10.1021/acs.jmedchem.5b01697
Munshi S, Chen Z, Yan Y et al (2000) An alternate binding site for the P1–P3 group of a class of potent HIV-1 protease inhibitors as a result of concerted structural change in the 80s loop of the protease. Acta Crystallogr D Biol Crystallogr 56:381–388. https://doi.org/10.1107/S0907444900000469
Thal DM, Sun B, Feng D et al (2016) Crystal structures of the M1 and M4 muscarinic acetylcholine receptors. Nature 531:335–340. https://doi.org/10.1038/nature17188
Wood ER, Truesdale AT, McDonald OB et al (2004) A unique structure for epidermal growth factor receptor bound to GW572016 (Lapatinib): relationships among protein conformation, inhibitor off-rate, and receptor activity in tumor cells. Cancer Res 64:6652–6659. https://doi.org/10.1158/0008-5472.CAN-04-1168
Krasowski A, Muthas D, Sarkar A et al (2011) DrugPred: a structure-based approach to predict protein druggability developed using an extensive nonredundant data set. J Chem Inf Model 51:2829–2842. https://doi.org/10.1021/ci200266d
Borrel A, Regad L, Xhaard H et al (2015) PockDrug: a model for predicting pocket druggability that overcomes pocket estimation uncertainties. J Chem Inf Model 55:882–895. https://doi.org/10.1021/ci5006004
Schmidtke P, Barril X (2010) Understanding and predicting druggability. A high-throughput method for detection of drug binding sites. J Med Chem 53:5858–5867. https://doi.org/10.1021/jm100574m
Bacci M, Langini C, Vymětal J et al (2017) Focused conformational sampling in proteins. J Chem Phys 147:195102. https://doi.org/10.1063/1.4996879
Laio A, Gervasio FL (2008) Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science. Rep Prog Phys 71:126601. https://doi.org/10.1088/0034-4885/71/12/126601
Kuzmanic A, Bowman GR, Juarez-Jimenez J et al (2020) Investigating cryptic binding sites by molecular dynamics simulations. Acc Chem Res. https://doi.org/10.1021/acs.accounts.9b00613
Duarte JM, Srebniak A, Schärer MA, Capitani G (2012) Protein interface classification by evolutionary analysis. BMC Bioinform 13:334. https://doi.org/10.1186/1471-2105-13-334
Capitani G, Duarte JM, Baskaran K et al (2016) Understanding the fabric of protein crystals: computational classification of biological interfaces and crystal contacts. Bioinformatics 32:481–489. https://doi.org/10.1093/bioinformatics/btv622
Jalencas X, Mestres J (2013) Chemoisosterism in the proteome. J Chem Inf Model 53:279–292. https://doi.org/10.1021/ci3002974
Keefer CE, Chang G (2017) The use of matched molecular series networks for cross target structure activity relationship translation and potency prediction. MedChemComm 8:2067–2078. https://doi.org/10.1039/C7MD00465F
Krotzky T, Rickmeyer T, Fober T, Klebe G (2014) Extraction of protein binding pockets in close neighborhood of bound ligands makes comparisons simple due to inherent shape similarity. J Chem Inf Model 54:3229–3237. https://doi.org/10.1021/ci500553a
Acknowledgements
The authors thank Imtiaz Hossein, Michael Schaefer and Richard Lewis for insightful discussions. J.-R.M. thanks the ProDy development team and generally all contributors to open source codes for their crucial work.
Funding
This work was supported by the postdoctoral office of the Novartis Institutes for Biomedical Research.
Author information
Authors and Affiliations
Contributions
The study was designed by all authors. JRM wrote the software and performed the analysis. JRM and FS analyzed the results. The manuscript was written by JRM and FS. All authors have given approval to the final version of the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
We declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Marchand, JR., Pirard, B., Ertl, P. et al. CAVIAR: a method for automatic cavity detection, description and decomposition into subcavities. J Comput Aided Mol Des 35, 737–750 (2021). https://doi.org/10.1007/s10822-021-00390-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-021-00390-w