Evaluating protein binding interfaces with transformer networks

Stebliankin, Vitalii; Shirali, Azam; Baral, Prabin; Shi, Jimeng; Chapagain, Prem; Mathee, Kalai; Narasimhan, Giri

doi:10.1038/s42256-023-00715-4

Article
Published: 07 September 2023

Evaluating protein binding interfaces with transformer networks

Nature Machine Intelligence volume 5, pages 1042–1053 (2023)Cite this article

4678 Accesses
9 Citations
22 Altmetric
Metrics details

Subjects

A preprint version of the article is available at bioRxiv.

Abstract

Computational protein-binding studies are widely used to investigate fundamental biological processes and facilitate the development of modern drugs, vaccines and therapeutics. Scoring functions aim to assess and rank the binding strength of the predicted protein complex. However, accurate scoring of protein binding interfaces remains a challenge. Here we show that our evaluating Protein binding Interfaces with Transformer Networks (PIsToN) approach can distinguish native-like protein complexes from incorrect conformations. Protein interfaces are transformed into a collection of two-dimensional images (interface maps), each corresponding to a geometric or biochemical property. Pixel intensities represent the feature values. A neural network was adapted from a popular vision transformer with several enhancements: a hybrid component to accept empirical-based energy terms, a multi-attention module to highlight essential features and binding sites, and the use of contrastive learning for better ranking performance. The resulting PIsToN model substantially outperforms state-of-the-art scoring functions on well-known datasets.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Generic protein–ligand interaction scoring by integrating physical prior knowledge and data augmentation modelling

Article 06 June 2024

ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction

Article 30 May 2022

PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces

Article Open access 18 April 2023

Data availability

The lists of training/testing protein complexes and pre-computed interface maps are available on Zenodo at https://doi.org/10.5281/zenodo.7948337.

Code availability

The PIsToN software and benchmark datasets are available under license from https://biorg.cis.fiu.edu/piston/. The specific version of PIsToN used in this study is readily available for access via Zenodo⁸⁴.

References

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
Callaway, E. After AlphaFold: protein-folding contest seeks next big breakthrough. Nature 604, 234–238 (2022).
Article Google Scholar
Vakser, I. A. Protein–protein docking: from interaction to interactome. Biophys. J. 107, 1785–1793 (2014).
Article Google Scholar
Shin, W.-H., Christoffer, C. W. & Kihara, D. In silico structure-based approaches to discover protein–protein interaction-targeting drugs. Methods 131, 22–32 (2017).
Article Google Scholar
Meng, X.-Y., Zhang, H.-X., Mezei, M. & Cui, M. Molecular docking: a powerful approach for structure-based drug discovery. Curr. Comput. Aided Drug Des. 7, 146–157 (2011).
Article Google Scholar
Scior, T. et al. Recognizing pitfalls in virtual screening: a critical review. J. Chem. Inf. Model. 52, 867–881 (2012).
Article Google Scholar
Gupta, M., Sharma, R. & Kumar, A. Docking techniques in pharmacology: how much promising? Comput. Biol. Chem. 76, 210–217 (2018).
Article Google Scholar
Huang, S.-Y. Search strategies and evaluation in protein–protein docking: principles, advances and challenges. Drug Discov. Today 19, 1081–1096 (2014).
Article Google Scholar
Shen, C. et al. From machine learning to deep learning: advances in scoring functions for protein–ligand docking. Wiley Interdiscip. Rev. Comput. Mol. Sci. 10, e1429 (2020).
Article Google Scholar
Moal, I. H., Torchala, M., Bates, P. A. & Fernández-Recio, J. The scoring of poses in protein–protein docking: current capabilities and future directions. BMC Bioinformatics 14, 286 (2013).
Adeshina, Y. O., Deeds, E. J. & Karanicolas, J. Machine learning classification can reduce false positives in structure-based virtual screening. Proc. Natl Acad. Sci. USA 117, 18477–18488 (2020).
Article Google Scholar
Li, J., Fu, A. & Zhang, L. An overview of scoring functions used for protein–ligand interactions in molecular docking. Interdiscip. Sci. Comput. Life Sci. 11, 320–328 (2019).
Article Google Scholar
Kortemme, T. & Baker, D. A simple physical model for binding energy hot spots in protein–protein complexes. Proc. Natl Acad. Sci. USA 99, 14116–14121 (2002).
Article Google Scholar
Liu, X., Peng, L. & Zhang, J. Z. Accurate and efficient calculation of protein–protein binding free energy-interaction entropy with residue type-specific dielectric constants. J. Chem. Inf. Model. 59, 272–281 (2018).
Article Google Scholar
Huang, S.-Y., Grinter, S. Z. & Zou, X. Scoring functions and their evaluation methods for protein–ligand docking: recent advances and future directions. Phys. Chem. Chem. Phys. 12, 12899–12908 (2010).
Article Google Scholar
Durham, E., Dorr, B., Woetzel, N., Staritzbichler, R. & Meiler, J. Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J. Mol. Model. 15, 1093–1108 (2009).
Article Google Scholar
Eldridge, M. D., Murray, C. W., Auton, T. R., Paolini, G. V. & Mee, R. P. Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J. Comput. Aided Mol. Des. 11, 425–445 (1997).
Article Google Scholar
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
Google Scholar
Chen, Y.-C. Beware of docking! Trends Pharmacol. Sci. 36, 78–95 (2015).
Article Google Scholar
Muley, L. et al. Enhancement of hydrophobic interactions and hydrogen bond strength by cooperativity: synthesis, modeling, and molecular dynamics simulations of a congeneric series of thrombin inhibitors. J. Med. Chem. 53, 2126–2135 (2010).
Article Google Scholar
Liu, J. & Wang, R. Classification of current scoring functions. J. Chem. Inf. Mode. 55, 475–482 (2015).
Article Google Scholar
Kinnings, S. L. et al. A machine learning-based method to improve docking scoring functions and its application to drug repurposing. J. Chem. Inf. Model. 51, 408–419 (2011).
Article Google Scholar
Das, S. & Chakrabarti, S. Classification and prediction of protein–protein interaction interface using machine learning algorithm. Sci. Rep. 11, 1761 (2021).
Zilian, D. & Sotriffer, C. A. SFCscore(RF): a random forest-based scoring function for improved affinity prediction of protein–ligand complexes. J. Chem. Inf. Model. 53, 1923–1933 (2013).
Article Google Scholar
Durrant, J. D. & McCammon, J. A. NNScore: a neural-network-based scoring function for the characterization of protein–ligand complexes. J. Chem. Inf. Model. 50, 1865–1871 (2010).
Article Google Scholar
Ballester, P. J. & Mitchell, J. B. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics 26, 1169–1175 (2010).
Article Google Scholar
Li, H., Leung, K.-S., Wong, M.-H. & Ballester, P. J. Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study. BMC Bioinformatics 15, 291 (2014).
Li, Y., Zhang, X. & Cao, D. The role of shape complementarity in the protein–protein interactions. Sci. Rep. 3, 3271 (2013).
Wallach, I., Dzamba, M. & Heifets, A. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. Preprint at https://arxiv.org/abs/1510.02855 (2015).
Imrie, F., Bradley, A. R., van der Schaar, M. & Deane, C. M. Protein family-specific models using deep neural networks and transfer learning improve virtual screening and highlight the need for more data. J. Chem. Inf. Model. 58, 2319–2330 (2018).
Article Google Scholar
Balci, A. et al. DeepInterface: protein–protein interface validation using 3D convolutional neural networks. Preprint at bioRxiv https://doi.org/10.1101/617506 (2019).
Wang, X., Terashi, G., Christoffer, C. W., Zhu, M. & Kihara, D. Protein docking model evaluation by 3D deep convolutional neural networks. Bioinformatics 36, 2113–2118 (2020).
Article Google Scholar
Renaud, N. et al. DeepRank: a deep learning framework for data mining 3D protein-protein interfaces. Nat. Commun. 12, 7068 (2021).
Mohseni Behbahani, Y., Crouzet, S., Laine, E. & Carbone, A. Deep local analysis evaluates protein docking conformations with locally oriented cubes. Bioinformatics 38, 4505–4512 (2022).
Article Google Scholar
Kumawat, S. & Raman, S. LP-3DCNN: unveiling local phase in 3D convolutional neural networks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 4903–4912 (IEEE, 2019).
Geng, C. et al. iScore: a novel graph kernel-based function for scoring protein–protein docking models. Bioinformatics 36, 112–121 (2020).
Article Google Scholar
Budowski-Tal, I., Kolodny, R. & Mandel-Gutfreund, Y. A novel geometry-based approach to infer protein interface similarity. Sci. Rep. 8, 1–10 (2018).
Article Google Scholar
Wang, X., Flannery, S. T. & Kihara, D. Protein docking model evaluation by graph neural networks. Front. Mol. Biosci. 8, 647915 (2021).
Article Google Scholar
Réau, M., Renaud, N., Xue, L. C. & Bonvin, A. M. DeepRank-GNN: a graph neural network framework to learn patterns in protein–protein interfaces. Bioinformatics 39, 759 (2023).
Article Google Scholar
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Article Google Scholar
Sverrisson, F., Feydy, J., Correia, B. E. & Bronstein, M. M. Fast end-to-end learning on protein surfaces. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15272–15281 (IEEE, 2021).
Zhai, X., Kolesnikov, A., Houlsby, N. & Beyer, L. Scaling vision transformers. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 12104–12113 (IEEE, 2022).
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. 9th International Conference on Learning Representations, ICLR (2020).
Yan, Y., Tao, H., He, J. & Huang, S.-Y. The HDOCK server for integrated protein–protein docking. Nat. Protoc. 15, 1829–1852 (2020).
Article Google Scholar
Janin, J. et al. CAPRI: a critical assessment of predicted interactions. Proteins 52, 2–9 (2003).
Article Google Scholar
Khosla, P. et al. Supervised contrastive learning. Ad. Neural Inf. Process. Syst. 33, 18661–18673 (2020).
Google Scholar
Chen, C. et al. This looks like that: deep learning for interpretable image recognition. Adv. Neural Inf. Process. Systems 32, 8930–8941 (2019).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article Google Scholar
Jones, D. T. Setting the standards for machine learning in biology. Nat. Rev. Mol. Cell Biol. 20, 659–660 (2019).
Article Google Scholar
Jeong, J.-J. et al. Characterization of the cupin-type phosphoglucose isomerase from the hyperthermophilic archaeon Thermococcus litoralis. FEBS Lett. 535, 200–204 (2003).
Article Google Scholar
Dominguez, C., Boelens, R. & Bonvin, A. M. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).
Article Google Scholar
Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein–ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).
Article Google Scholar
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2021).
Stebliankin, V. et al. EMoMiS: a pipeline for epitope-based molecular mimicry search in protein structures with applications to SARS-CoV-2. Preprint at bioRxiv https://doi.org/10.1101/2022.02.05.479274 (2022).
Balbin, C. A. et al. Epitopedia: identifying molecular mimicry between pathogens and known immune epitopes. ImmunoInformatics 9, 100023 (2023).
Article Google Scholar
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Article Google Scholar
Valdes, C. et al. Microbiome maps: Hilbert curve visualizations of metagenomic profiles. Front. Bioinformatics 3, 1154588 (2023).
Article Google Scholar
Andrusier, N., Nussinov, R. & Wolfson, H. J. FireDock: fast interaction refinement in molecular docking. Proteins 69, 139–159 (2007).
Article Google Scholar
Gray, J. J. et al. Protein–protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Mol. Biol. 331, 281–299 (2003).
Article Google Scholar
Zhang, C., Vasmatzis, G., Cornette, J. L. & DeLisi, C. Determination of atomic desolvation energies from the structures of crystallized proteins. J. Mol. Biol. 267, 707–726 (1997).
Article Google Scholar
Dunbrack Jr, R. L. & Cohen, F. E. Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci. 6, 1661–1681 (1997).
Article Google Scholar
Neria, E., Fischer, S. & Karplus, M. Simulation of activation free energies in molecular systems. J. Chem. Phys. 105, 1902–1921 (1996).
Article Google Scholar
Crowley, P. B. & Golovin, A. Cation–π interactions in protein–protein interfaces. Proteins59, 231–239 (2005).
Article Google Scholar
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).
Article Google Scholar
Touw, W. G. et al. A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 43, 364–368 (2015).
Article Google Scholar
Mead, A. Review of the development of multidimensional scaling methods. J. R. Stat. Soc. Ser. D 41, 27–39 (1992).
Google Scholar
Ng, A. Y. Feature selection, L₁ vs. L₂ regularization, and rotational invariance. In Proc. Twenty-first International Conference on Machine Learning 78 (Association for Computing Machinery, 2004).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. 7th International Conference on Learning Representations, ICLR, 6–9 (2017).
Lensink, M. F. & Wodak, S. J. Score_set: a CAPRI benchmark for scoring protein complexes. Proteins 82, 3163–3169 (2014).
Article Google Scholar
Zhang, C., Shine, M., Pyle, A. M. & Zhang, Y. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat. Methods 19, 1109–1115 (2022).
Article Google Scholar
Cheng, T. M.-K., Blundell, T. L. & Fernandez-Recio, J. pyDock: electrostatics and desolvation for effective scoring of rigid-body protein–protein docking. Proteins 68, 503–515 (2007).
Article Google Scholar
Leman, J. K. et al. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 17, 665–680 (2020).
Article MathSciNet Google Scholar
Pierce, B. & Weng, Z. A combination of rescoring and refinement significantly improves protein docking performance. Proteins 72, 270–279 (2008).
Article Google Scholar
Viswanath, S., Ravikant, D. & Elber, R. Improving ranking of models for protein complexes with side chain modeling and atomic potentials. Proteins 81, 592–606 (2013).
Article Google Scholar
Pons, C., Talavera, D., De La Cruz, X., Orozco, M. & Fernandez-Recio, J. Scoring by intermolecular pairwise propensities of exposed residues (SIPPER): a new efficient potential for protein–protein docking. J. Chem. Inf. Model. 51, 370–377 (2011).
Article Google Scholar
Ravikant, D. & Elber, R. Pie-efficient filters and coarse grained potentials for unbound protein–protein docking. Proteins 78, 400–419 (2010).
Article Google Scholar
Moal, I. H., Jiménez-García, B. & Fernández-Recio, J. CCharPPI web server: computational characterization of protein–protein interactions from structure. Bioinformatics 31, 123–125 (2015).
Article Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article Google Scholar
Renaud, N. & Geng, C. The pdb2sql Python package: parsing, manipulation and analysis of PDB files using SQL queries. J. Open Source Softw. 5, 2077 (2020).
Article Google Scholar
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Processing Syst. 32, 8024–8035 (2019).
Collaborative Data Science (Plotly Technologies, 2015); https://plot.ly
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
The PyMOL Molecular Graphics System, Version 1.8 (Schrödinger, LLC, 2015).
Stebliankin, V. stebliankin/piston: PIsToN (v.1.0.0). Zenodo https://doi.org/10.5281/zenodo.8102876 (2023).
Shore, D., Issafras, H., Landais, E., Teyton, L. & Wilson, I. The crystal structure of CD8 in complex with YTS156. 7.7 Fab and interaction with other CD8 antibodies define the binding mode of CD8 αβ to MHC class I. J. Mol. Biol. 384, 1190–1202 (2008).
Article Google Scholar

Download references

Acknowledgements

We thank the members of the Bioinformatics Research Group (BioRG) at FIU for their valuable feedback and comments. This work grew out of a project that was supported by grants from the National Science Foundation (CNS-2037374 and OAC-2118329). V.S. gratefully acknowledges funding for graduate assistantships from the Department of Education, the National Science Foundation, and the Knight Foundation School of Computing and Information Sciences

Author information

Authors and Affiliations

Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL, USA
Vitalii Stebliankin, Azam Shirali, Jimeng Shi & Giri Narasimhan
Department of Physics, College of Arts, Science and Education, Florida International University, Miami, FL, USA
Prabin Baral & Prem Chapagain
Biomolecular Sciences Institute, Florida International University, Miami, FL, USA
Prem Chapagain, Kalai Mathee & Giri Narasimhan
Herbert Wertheim College of Medicine, Florida International University, Miami, FL, USA
Kalai Mathee

Authors

Vitalii Stebliankin
View author publications
You can also search for this author in PubMed Google Scholar
Azam Shirali
View author publications
You can also search for this author in PubMed Google Scholar
Prabin Baral
View author publications
You can also search for this author in PubMed Google Scholar
Jimeng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Prem Chapagain
View author publications
You can also search for this author in PubMed Google Scholar
Kalai Mathee
View author publications
You can also search for this author in PubMed Google Scholar
Giri Narasimhan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.N. conceptualized and supervised the project. V.S. designed, implemented and tested the PIsToN framework. V.S. and A.S. executed benchmarks. P.B., J.S., P.C. and K.M. assisted in the feature extraction design and interpretations. V.S., A.S., K.M. and G.N. contributed to the paper writing and rewriting.

Corresponding author

Correspondence to Giri Narasimhan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Alex Morehead and Jason Yim for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Note 1, Figs. 1–6 and Tables 1 and 2.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Stebliankin, V., Shirali, A., Baral, P. et al. Evaluating protein binding interfaces with transformer networks. Nat Mach Intell 5, 1042–1053 (2023). https://doi.org/10.1038/s42256-023-00715-4

Download citation

Received: 10 December 2022
Accepted: 03 August 2023
Published: 07 September 2023
Issue Date: September 2023
DOI: https://doi.org/10.1038/s42256-023-00715-4