Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Protein–protein contact prediction by geometric triangle-aware protein language models

Abstract

Information regarding the residue–residue distance between interacting proteins is important for modelling the structures of protein complexes, as well as being valuable for understanding the molecular mechanism of protein–protein interactions. With the advent of deep learning, many methods have been developed to accurately predict the intra-protein residue–residue contacts of monomers. However, it is still challenging to accurately predict inter-protein residue–residue contacts for protein complexes, especially hetero-protein complexes. Here we develop a protein language model-based deep learning method to predict the inter-protein residue–residue contacts of protein complexes—named DeepInter—by introducing a triangle-aware mechanism of triangle update and triangle self-attention into the deep neural network. We extensively validate DeepInter on diverse test sets of 300 homodimeric, 28 CASP-CAPRI homodimeric and 99 heterodimeric complexes and compare it with state-of-the-art methods including CDPred, DeepHomo2.0, GLINTER and DeepHomo. The results demonstrate the accuracy and robustness of DeepInter.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The framework of DeepInter.
Fig. 2: Comparison of DeepInter with other methods on the Homodimer300 test set with experimental structures as input.
Fig. 3: Comparison of DeepInter with other methods on the Heterodimer99 test set with the input of experimental structures.
Fig. 4: Ablation experiments with DeepInter on the Homodimer300 test set with the input of experimental structures.
Fig. 5: Examples of predicted contact maps.

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon request. A full list of the protein complexes used in this study is also provided in Supplementary Data 10. The protein structures used in this study are all available in the PDB, and the sequence database of Uniref30_2020_03 used in this study is available at https://www.uniprot.org/help/uniref/. Source data are provided with this paper.

Code availability

The DeepInter package is freely available at http://huanglab.phys.hust.edu.cn/DeepInter/ and https://doi.org/10.5281/zenodo.8304327 ref. 61.

References

  1. Yadid, I. & Tawfik, D. S. Reconstruction of functional beta-propeller lectins via homo-oligomeric assembly of shorter fragments. J. Mol. Biol. 365, 10–17 (2007).

    Article  Google Scholar 

  2. Goodsell, D. S. & Olson, A. J. Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153 (2000).

    Article  Google Scholar 

  3. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  4. Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins 89, 1607–1617 (2021).

    Article  Google Scholar 

  5. Ko, J. & Lee, J. Can AlphaFold2 predict protein-peptide complex structures accurately? Preprint at https://www.biorxiv.org/content/10.1101/2021.07.27.453972v2 (2021).

  6. Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein–protein interactions using AlphaFold2. Nat. Commun. 13, 1265 (2022).

    Article  Google Scholar 

  7. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

    Article  Google Scholar 

  8. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2 (2021).

  9. Yan, Y. & Huang, S. Y. Accurate prediction of inter-protein residue-residue contacts for homo-oligomeric protein complexes. Brief. Bioinformatics. 22, bbab038 (2021).

    Article  Google Scholar 

  10. Yan, Y., Tao, H., He, J. & Huang, S. Y. The HDOCK server for integrated protein–protein docking. Nat. Protoc. 15, 1829–1852 (2020).

    Article  Google Scholar 

  11. Du, Z. et al. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 16, 5634–5651 (2021).

    Article  Google Scholar 

  12. Yan, Y., Tao, H. & Huang, S. Y. HSYMDOCK: a docking web server for predicting the structure of protein homo-oligomers with Cn or Dn symmetry. Nucleic Acids Res. 46, W423–W431 (2018).

    Article  Google Scholar 

  13. Soltanikazemi, E., Quadir, F., Roy, R. S., Guo, Z. & Cheng, J. Distance-based reconstruction of protein quaternary structures from inter-chain contacts. Proteins 90, 720–731 (2022).

    Article  Google Scholar 

  14. Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014).

    Article  Google Scholar 

  15. Roy, R. S., Quadir, F., Soltanikazemi, E. & Cheng, J. A deep dilated convolutional residual network for predicting interchain contacts of protein homodimers. Bioinformatics 38, 1904–1910 (2022).

    Article  Google Scholar 

  16. Quadir, F., Roy, R. S., Halfmann, R. & Cheng, J. DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning. Sci. Rep. 11, 12295 (2021).

    Article  Google Scholar 

  17. Sanchez-Garcia, R., Sorzano, C. O. S., Carazo, J. M. & Segura, J. BIPSPI: a method for the prediction of partner-specific protein-protein interfaces. Bioinformatics 35, 470–477 (2019).

    Article  Google Scholar 

  18. Sanchez-Garcia, R., Macias, J. R., Sorzano, C. O. S., Carazo, J. M. & Segura, J. BIPSPI+: mining type-specific datasets of protein complexes to improve protein binding site prediction. J. Mol. Biol. 434, 167556 (2022).

    Article  Google Scholar 

  19. Zhao, Z. & Gong, X. Protein-protein interaction interface residue pair prediction based on deep learning architecture. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 1753–1759 (2019).

    Google Scholar 

  20. Liu, J. & Gong, X. Attention mechanism enhanced LSTM with residual architecture and its application for protein-protein interaction residue pairs prediction. BMC Bioinformatics 20, 609 (2019).

    Article  Google Scholar 

  21. Soleymani, F., Paquet, E., Viktor, H., Michalowski, W. & Spinello, D. Protein-protein interaction prediction with deep learning: a comprehensive review. Comput. Struct. Biotechnol. J. 20, 5316–5341 (2022).

    Article  Google Scholar 

  22. Baranwal, M. et al. Struct2Graph: a graph attention network for structure based predictions of protein-protein interactions. BMC Bioinformatics 23, 370 (2022).

    Article  Google Scholar 

  23. Hu, X., Feng, C., Zhou, Y., Harrison, A. & Chen, M. DeepTrio: a ternary prediction system for protein-protein interaction using mask multiple parallel convolutional neural networks. Bioinformatics 38, 694–702 (2022).

    Article  Google Scholar 

  24. Soleymani, F., Paquet, E., Viktor, H. L., Michalowski, W. & Spinello, D. ProtInteract: a deep learning framework for predicting protein-protein interactions. Comput. Struct. Biotechnol. J. 21, 1324–1348 (2023).

    Article  Google Scholar 

  25. Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).

    Article  Google Scholar 

  26. Jones, D. T., Buchan, D. W., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012).

    Article  Google Scholar 

  27. Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. & Aurell, E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E 87, 012707 (2013).

    Article  Google Scholar 

  28. Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).

    Article  Google Scholar 

  29. Li, Y. et al. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput. Biol. 17, e1008865 (2021).

    Article  Google Scholar 

  30. Adhikari, B., Hou, J. & Cheng, J. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 34, 1466–1472 (2018).

    Article  Google Scholar 

  31. Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13, e1005324 (2017).

    Article  Google Scholar 

  32. Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).

    Article  Google Scholar 

  33. Wang, W., Peng, Z. & Yang, J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat. Comput. Sci. 2, 804–814 (2022).

    Article  Google Scholar 

  34. Lin, P., Yan, Y. & Huang, S. Y. DeepHomo2.0: improved protein-protein contact prediction of homodimers by transformer-enhanced deep learning. Brief. Bioinformatics 24, bbac499 (2023).

    Article  Google Scholar 

  35. Xie, Z. & Xu, J. Deep graph learning of inter-protein contacts. Bioinformatics 38, 947–953 (2022).

    Article  Google Scholar 

  36. Guo, Z., Liu, J., Skolnick, J. & Cheng, J. Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks. Nat. Commun. 13, 6963 (2022).

    Article  Google Scholar 

  37. Bitbol, A. F., Dwyer, R. S., Colwell, L. J. & Wingreen, N. S. Inferring interaction partners from protein sequences. Proc. Natl Acad. Sci. USA 113, 12180–12185 (2016).

    Article  Google Scholar 

  38. Szurmant, H. & Weigt, M. Inter-residue, inter-protein and inter-family coevolution: bridging the scales. Curr. Opin. Struct. Biol. 50, 26–32 (2018).

    Article  Google Scholar 

  39. Gueudr’e, T., Baldassi, C., Zamparo, M., Weigt, M. & Pagnani, A. Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc. Natl Acad. Sci. USA 113, 12186–12191 (2016).

    Article  Google Scholar 

  40. Ovchinnikov, S., Kamisetty, H. & Baker, D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 3, e02030 (2014).

    Article  Google Scholar 

  41. Zeng, H. et al. ComplexContact: a web server for inter-protein contact prediction using deep learning. Nucleic Acids Res. 46, W432–W437 (2018).

    Article  Google Scholar 

  42. Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).

    Google Scholar 

  43. Lensink, M. F. et al. The challenge of modeling protein assemblies: the CASP12-CAPRI experiment. Proteins 86, 257–273 (2018).

    Article  Google Scholar 

  44. Lensink, M. F. et al. Blind prediction of homo- and hetero-protein complexes: the CASP13-CAPRI experiment. Proteins 87, 1200–1221 (2019).

    Article  Google Scholar 

  45. Rao, R. et al. MSA transformer. Proc. 38th International Conference on Machine Learning 139, 8844–8856 (PMLR, 2021).

  46. Hatos, A. et al. DisProt: intrinsic protein disorder annotation in 2020. Nucleic Acids Res. 48, D269–D276 (2020).

    Google Scholar 

  47. Xu, J. & Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26, 889–895 (2010).

    Article  Google Scholar 

  48. Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170–D176 (2017).

    Article  Google Scholar 

  49. Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM–HMM alignment. Nat. Methods 9, 173–175 (2011).

    Article  Google Scholar 

  50. Seemayer, S., Gruber, M. & Söding, J. CCMpred—fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 30, 3128–3130 (2014).

    Article  Google Scholar 

  51. Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics 20, 473 (2019).

    Article  Google Scholar 

  52. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  Google Scholar 

  53. Si, Y. & Yan, C. Improved protein contact prediction using dimensional hybrid residual networks and singularity enhanced loss function. Brief. Bioinformatics 22, bbab341 (2021).

    Article  Google Scholar 

  54. Su, H. et al. Improved protein structure prediction using a new multi-scale network and homologous templates. Adv. Sci. 8, e2102592 (2021).

    Article  Google Scholar 

  55. Hubbard, S. J. & Thornton, J. M. NACCESS: computer program (Department of Biochemistry and Molecular Biology, University College London, 1993).

  56. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    Article  Google Scholar 

  57. Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive datasets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Article  Google Scholar 

  58. Steinegger, M. & Söding, J. Clustering huge protein sequence sets in linear time. Nat. Commun. 9, 2542 (2018).

    Article  Google Scholar 

  59. Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 318–327 (2020).

    Article  Google Scholar 

  60. Kinga, D. & Adam, J. B. A method for stochastic optimization. In International Conference on Learning Representations (ICLR) (2015).

  61. Lin, P., Tao, H., Li, H. & Huang, S.-Y. Protein-protein contact prediction by geometric triangle-aware protein language models. Zenodo (2023); https://doi.org/10.5281/zenodo.8304327

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grants nos. 32161133002 and 62072199) and a startup grant of Huazhong University of Science and Technology.

Author information

Authors and Affiliations

Authors

Contributions

S.-Y.H. conceived and supervised the project. P.L. and S.-Y.H. designed and performed the experiments. P.L., H.T. and H.L. analysed the data. P.L., H.T., H.L. and S.-Y.H. wrote the paper. All authors reviewed and approved the final version of the paper.

Corresponding author

Correspondence to Sheng-You Huang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Jacob Huth, in collaboration with the Nature Machine Intelligence team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Impact of MSA depth and contact density, Impact of sequence cropping size, Impact of conformational changes, Impact of intrinsically disordered proteins, Impact of structural similarity, Impact of intra-protein distance usage, Supplementary Tables 1–4, Fig. 1, algorithm and reference.

Supplementary Data

Supplementary data 1–10.

Source data

Source Data

Source data Table 1–3, Source data Fig. 2–4, Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, P., Tao, H., Li, H. et al. Protein–protein contact prediction by geometric triangle-aware protein language models. Nat Mach Intell 5, 1275–1284 (2023). https://doi.org/10.1038/s42256-023-00741-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00741-2

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics