Skip to main content

Advertisement

Log in

Expanding the sequence spaces of synthetic binding protein using deep learning-based framework ProteinMPNN

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Synthetic binding proteins (SBPs) with small size, marked solubility and stability, and high affinity are important for protein-based research, treatment, and diagnostics. Over the last several decades, site-directed mutagenesis and directed evolution of privileged protein scaffold make up the great majority of SBPs. The groundbreaking advancement of deep learning (DL) in recent years has revolutionized the problem of protein structure prediction and design. Here, for the first time, the cutting-edge DL framework ProteinMPNN was applied to fulfill the de novo design of 7,245 new synthetic proteins covering 55 different scaffolds based on the original SBPs collected in our SYNBIP database. Comprehensive bioinformatics analysis indicated that, in addition to the excellent performance of sequence recovery, the designed synthetic proteins have a significant improvement in solubility and thermal stability compared to the currently known SBPs. Meanwhile, 8 incredibly suitable protein scaffolds for ProteinMPNN have been identified, from which the designed synthetic proteins calculate displayed good performance on binding ability to their corresponding protein targets. Therefore, the DL-based framework shown great potential in target-directed de novo generation of synthetic protein library with high quality, which could assist experimental biologists to rational protein engineering to discover novel functional protein binders.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gebauer M, Skerra A. Engineered protein scaffolds as next-generation therapeutics. Annual Review of Pharmacology and Toxicology, 2020, 60:391–415

    Article  Google Scholar 

  2. Wang X, Li F, Qiu W, Xu B, Li Y, Lian X, Yu H, Zhang Z, Wang J, Li Z, Xue W, Zhu F. SYNBIP: synthetic binding proteins for research, diagnosis and therapy. Nucleic Acids Research, 2022, 50(D1): D560–D570

    Article  Google Scholar 

  3. Huang P S, Boyken S E, Baker D. The coming of age of de novo protein design. Nature, 2016, 537(7620): 320–327

    Article  Google Scholar 

  4. Carpenter E P, Beis K, Cameron A D, Iwata S. Overcoming the challenges of membrane protein crystallography. Current Opinion in Structural Biology, 2008, 18(5): 581–586

    Article  Google Scholar 

  5. Zeymer C, Hilvert D. Directed evolution of protein catalysts. Annual Review of Biochemistry, 2018, 87: 131–157

    Article  Google Scholar 

  6. Engqvist M K M, Rabe K S. Applications of protein engineering and directed evolution in plant research. Plant Physiology, 2019, 179(3): 907–917

    Article  Google Scholar 

  7. Cao L, Coventry B, Goreshnik I, Huang B, Sheffler W, Park J S, Jude K M, Markovic I, Kadam R U, Verschueren K H G, Verstraete K, Walsh S T R, Bennett N, Phal A, Yang A, Kozodoy L, DeWitt M, Picton L, Miller L, Strauch E M, DeBouver N D, Pires A, Bera A K, Halabiya S, Hammerson B, Yang W, Bernard S, Stewart L, Wilson I A, Ruohola-Baker H, Schlessinger J, Lee S, Savvides S N, Garcia K C, Baker D. Design of protein-binding proteins from the target structure alone. Nature, 2022, 605(7910): 551–560

    Article  Google Scholar 

  8. Baker D. What has de novo protein design taught us about protein folding and biophysics? Protein Science, 2019, 28(4): 678–683

    Article  Google Scholar 

  9. Liang T, Jiang C, Yuan J, Othman Y, Xie X Q, Feng Z. Differential performance of RoseTTAFold in antibody modeling. Briefings in Bioinformatics, 2022, 23(5): bbac152

    Article  Google Scholar 

  10. Chen W, Qian G, Wan Y, Chen D, Zhou X, Yuan W, Duan X. Mesokinetics as a tool bridging the microscopic-to-macroscopic transition to rationalize catalyst design. Accounts of Chemical Research, 2022, 55(22): 3230–3241

    Article  Google Scholar 

  11. Chen W, Fu W, Duan X, Chen B, Qian G, Si R, Zhou X, Yuan W, Chen D. Taming electrons in Pt/C catalysts to boost the mesokinetics of hydrogen production. Engineering, 2022, 14: 124–133

    Article  Google Scholar 

  12. Liang T, Chen H, Yuan J, Jiang C, Hao Y, Wang Y, Feng Z, Xie X Q. IsAb: a computational protocol for antibody design. Briefings in Bioinformatics, 2021, 22(5): bbab143

    Article  Google Scholar 

  13. Kuhlman B, Bradley P. Advances in protein structure prediction and design. Nature Reviews Molecular Cell Biology, 2019, 20(11): 681–697

    Article  Google Scholar 

  14. Khakzad H, Igashov I, Schneuing A, Goverde C, Bronstein M, Correia B. A new age in protein design empowered by deep learning. Cell Systems, 2023, 14(11): 925–939

    Article  Google Scholar 

  15. Wang F, Feng X, Kong R, Chang S. Generating new protein sequences by using dense network and attention mechanism. Mathematical Biosciences and Engineering, 2023, 20(2): 4178–4197

    Article  Google Scholar 

  16. Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim P M. Fast and flexible protein design using deep graph neural networks. Cell Systems, 2020, 11(4): 402–411.e4

    Article  Google Scholar 

  17. Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics, 2022, 38(8): 2102–2110

    Article  Google Scholar 

  18. Anishchenko I, Pellock S J, Chidyausiku T M, Ramelot T A, Ovchinnikov S, Hao J, Bafna K, Norn C, Kang A, Bera A K, Dimaio F, Carter L, Chow C M, Montelione G T, Baker D. De novo protein design by deep network hallucination. Nature, 2021, 600(7889): 547–552

    Article  Google Scholar 

  19. Yeh A H W, Norn C, Kipnis Y, Tischer D, Pellock S J, Evans D, Ma P, Lee G R, Zhang J Z, Anishchenko I, Coventry B, Cao L, Dauparas J, Halabiya S, DeWitt M, Carter L, Houk K N, Baker D. De novo design of luciferases using deep learning. Nature, 2023, 614(7949): 774–780

    Article  Google Scholar 

  20. Ding W, Nakai K, Gong H. Protein design via deep learning. Briefings in Bioinformatics, 2022, 23(3): bbac102

    Article  Google Scholar 

  21. Lin E, Lin C H, Lane H Y. De novo peptide and protein design using generative adversarial networks: an update. Journal of Chemical Information and Modeling, 2022, 62(4): 761–774

    Article  Google Scholar 

  22. Yin R, Feng B Y, Varshney A, Pierce B G. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Science, 2022, 31(8): e4379

    Article  Google Scholar 

  23. Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte R J, Milles L F, Wicky B I M, Courbet A, de Haas R J, Bethel N, Leung P J Y, Huddy T F, Pellock S, Tischer D, Chan F, Koepnick B, Nguyen H, Kang A, Sankaran B, Bera A K, King N P, Baker D. Robust deep learning–based protein sequence design using ProteinMPNN. Science, 2022, 378(6615): 49–56

    Article  Google Scholar 

  24. Burley S K, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, Craig P A, Crichlow G V, Dalenberg K, Duarte J M, Dutta S, Fayazi M, Feng Z, Flatt J W, Ganesan S, Ghosh S, Goodsell D S, Green R K, Guranovic V, Henry J, Hudson B P, Khokhriakov I, Lawson C L, Liang Y, Lowe R, Peisach E, Persikova I, Piehl D W, Rose Y, Sali A, Segura J, Sekharan M, Shao C, Vallat B, Voigt M, Webb B, Westbrook J D, Whetstone S, Young J Y, Zalevsky A, Zardecki C. RCSB protein data bank (RCSB.org: delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Research, 2023, 51(D1): D488–D508

    Article  Google Scholar 

  25. Bennett N R, Coventry B, Goreshnik I, Huang B, Allen A, Vafeados D, Peng Y P, Dauparas J, Baek M, Stewart L, Dimaio F, De Munck S, Savvides S N, Baker D. Improving de novo protein binder design with deep learning. Nature Communications, 2023, 14(1): 2625

    Article  Google Scholar 

  26. Sequeiros-Borja C E, Surpeta B, Brezovsky J. Recent advances in user-friendly computational tools to engineer protein function. Briefings in Bioinformatics, 2021, 22(3): bbaa150

    Article  Google Scholar 

  27. Du Z, Su H, Wang W, Ye L, Wei H, Peng Z, Anishchenko I, Baker D, Yang J. The trRosetta server for fast and accurate protein structure prediction. Nature Protocols, 2021, 16(12): 5634–5651

    Article  Google Scholar 

  28. Cortajarena A L, Kajander T, Pan W, Cocco M J, Regan L. Protein design to understand peptide ligand recognition by tetratricopeptide repeat proteins. Protein Engineering, Design and Selection, 2004, 17(4): 399–409

    Article  Google Scholar 

  29. Mijit A, Wang X, Li Y, Xu H, Chen Y, Xue W. Mapping synthetic binding proteins epitopes on diverse protein targets by protein structure prediction and protein-protein docking. Computers in Biology and Medicine, 2023, 163: 107183

    Article  Google Scholar 

  30. Liu Y, Liu H. Protein sequence design on given backbones with deep learning. Protein Engineering, Design and Selection, 2024, 37: gzad024

    Article  Google Scholar 

  31. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 2017, 35(11): 1026–1028

    Article  Google Scholar 

  32. Pierleoni A, Indio V, Savojardo C, Fariselli P, Martelli P L, Casadio R. MemPype: a pipeline for the annotation of eukaryotic membrane proteins. Nucleic Acids Research, 2011, 39(S2): W375–W380

    Article  Google Scholar 

  33. Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3): 403–410

    Article  Google Scholar 

  34. Hebditch M, Carballo-Amador M A, Charonis S, Curtis R, Warwicker J. Protein–Sol: a web tool for predicting protein solubility from sequence. Bioinformatics, 2017, 33(19): 3098–3100

    Article  Google Scholar 

  35. Niwa T, Ying B W, Saito K, Jin W, Takada S, Ueda T, Taguchi H. Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proceedings of the National Academy of Sciences of the United States of America, 2009, 106(11): 4201–4206

    Article  Google Scholar 

  36. Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins M R, Appel R D, Bairoch A. Protein identification and analysis tools on the ExPASy server. In: Walker J M, ed. The Proteomics Protocols Handbook. Totowa: Humana, 2005, 571–607

    Chapter  Google Scholar 

  37. Chen C, Chen H, Zhang Y, Thomas H R, Frank M H, He Y, Xia R. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular Plant, 2020, 13(8): 1194–1202

    Article  Google Scholar 

  38. Lill M A, Danielson M L. Computer-aided drug design platform using PyMOL. Journal of Computer-Aided Molecular Design, 2011, 25(1): 13–19

    Article  Google Scholar 

  39. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. Journal of Molecular Biology, 2007, 372(3): 774–797

    Article  Google Scholar 

  40. Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences of the United States of America, 2000, 97(19): 10383–10388

    Article  Google Scholar 

  41. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl S A A, Ballard A J, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior A W, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature, 2021, 596(7873): 583–589

    Article  Google Scholar 

  42. Wright C F, Teichmann S A, Clarke J, Dobson C M. The importance of sequence diversity in the aggregation and evolution of proteins. Nature, 2005, 438(7069): 878–881

    Article  Google Scholar 

  43. Kramer R M, Shende V R, Motl N, Pace C N, Scholtz J M. Toward a molecular understanding of protein solubility: increased negative surface charge correlates with increased solubility. Biophysical Journal, 2012, 102(8): 1907–1915

    Article  Google Scholar 

  44. Navarro S, Ventura S. Computational re-design of protein structures to improve solubility. Expert Opinion on Drug Discovery, 2019, 14(10): 1077–1088

    Article  Google Scholar 

  45. Smialowski P, Martin-Galiano A J, Mikolajka A, Girschick T, Holak T A, Frishman D. Protein solubility: sequence based prediction and experimental verification. Bioinformatics, 2007, 23(19): 2536–2542

    Article  Google Scholar 

  46. Burley S K. Impact of structural biologists and the Protein Data Bank on small-molecule drug discovery and development. Journal of Biological Chemistry, 2021, 296: 100559

    Article  Google Scholar 

  47. Qing R, Hao S, Smorodina E, Jin D, Zalevsky A, Zhang S. Protein design: from the aspect of water solubility and stability. Chemical Reviews, 2022, 122(18): 14085–14179

    Article  Google Scholar 

  48. Patel S, Mathonet P, Jaulent A M, Ullman C G. Selection of a high-affinity WW domain against the extracellular region of VEGF receptor isoform-2 from a combinatorial library using CIS display. Protein Engineering, Design and Selection, 2013, 26(4): 307–315

    Article  Google Scholar 

  49. Saerens D, Conrath K, Govaert J, Muyldermans S. Disulfide bond introduction for general stabilization of immunoglobulin heavy-chain variable domains. Journal of Molecular Biology, 2008, 377(2): 478–488

    Article  Google Scholar 

  50. Reverdatto S, Burz D S, Shekhtman A. Peptide aptamers: development and applications. Current Topics in Medicinal Chemistry, 2015, 15(12): 1082–1101

    Article  Google Scholar 

  51. Karlsson G B, Jensen A, Stevenson L F, Woods Y L, Lane D P, Serensen M S. Activation of p53 by scaffold-stabilised expression of Mdm2-binding peptides: visualisation of reporter gene induction at the single-cell level. British Journal of Cancer, 2004, 91(8): 1488–1494

    Article  Google Scholar 

  52. Kwon N Y, Kim Y, Lee J O. Structural diversity and flexibility of diabodies. Methods, 2019, 154: 136–142

    Article  Google Scholar 

  53. Hey T, Fiedler E, Rudolph R, Fiedler M. Artificial, non-antibody binding proteins for pharmaceutical and industrial applications. Trends in Biotechnology, 2005, 23(10): 514–522

    Article  Google Scholar 

  54. Leenheer D, Ten Dijke P, Hipolito C J. A current perspective on applications of macrocyclic-peptide-based high-affinity ligands. Peptide Science, 2016, 106(6): 889–900

    Article  Google Scholar 

  55. Nicaise M, Valerio-Lepiniec M, Minard P, Desmadril M. Affinity transfer by CDR grafting on a nonimmunoglobulin scaffold. Protein Science, 2004, 13(7): 1882–1891

    Article  Google Scholar 

  56. Škrlec K, Štrukelj B, Berlec A. Non-immunoglobulin scaffolds: a focus on their targets. Trends in Biotechnology, 2015, 33(7): 408–418

    Article  Google Scholar 

  57. Sandhya S, Mudgal R, Kumar G, Sowdhamini R, Srinivasan N. Protein sequence design and its applications. Current Opinion in Structural Biology, 2016, 37: 71–80

    Article  Google Scholar 

  58. Gebauer M, Schiefner A, Matschiner G, Skerra A. Combinatorial design of an anticalin directed against the extra-domain b for the specific targeting of oncofetal fibronectin. Journal of Molecular Biology, 2013, 425(4): 780–802

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 21505009), the Entrepreneurship and Innovation Support Plan for Chinese Overseas Students of Chongqing (cx2020127).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Feng Zhu or Weiwei Xue.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Electronic supplementary material Supplementary material is available in the online version of this article at journal.hep.com.cn and link.springer.com

Yanlin Li received her bachelor’s degree in 2023 from the School of Pharmacy, Chongqing University, China. She is currently working toward a Master’s degree at Chongqing University, China. Her research interests mainly include protein design and database construction.

Wantong Jiao is a class of 2021 undergraduate at the School of Pharmacy, Chongqing University, China. She is interested in the molecular mechanism of drug and target recognition and interaction as well as the research of new drug dosage forms.

Ruihan Liu is currently studying for a master’s degree at Chongqing University in China. Her research interests mainly include database construction and structure-based drug design and screening.

Xuejin Deng received a bachelor’s degree from Guizhou University, China. She is currently pursuing a master’s degree at the School of Pharmacy, Chongqing University, China. Her research interest is protein design.

Feng Zhu is the Deputy Director of B&R International School of Medicine and a Distinguished Professor of Pharmaceutical Sciences at Zhejiang University, China. He obtained a bachelor’s and master’s degrees in Physics from Beijing Normal University, and a PhD in Pharmacy from the National University of Singapore. Based on artificial intelligence and OMIC (proteomics and metabolomics) technologies, their team conducts systematical exploration on the druggability and system profile of therapeutic targets, develops novel methods and online tools for target discovery, and further studies the mechanism underlying the interaction between drugs and their targets.

Weiwei Xue is an associate professor of Pharmaceutical Sciences at Chongqing University, China. He received a bachelor’s degree in Chemistry (2009) and a PhD in Cheminformatics (2014) from Lanzhou University, China. He worked as a visiting scholar in the Institute for Protein Design at the University of Washington (2018-2019), USA. The research in Dr. Xue’s Lab is focused on developing disease- and therapeutic-related bioinformatics databases and tools, and combing artificial intelligence and molecular modeling approaches to design innovative small molecules or protein binders against molecular targets of complex diseases, including psychiatric disorders, viral infection, and cancer. He has published more than 90 peer-reviewed papers in the area of bioinformatics and computational drug design.

Supplementary Materials for

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Jiao, W., Liu, R. et al. Expanding the sequence spaces of synthetic binding protein using deep learning-based framework ProteinMPNN. Front. Comput. Sci. 19, 195903 (2025). https://doi.org/10.1007/s11704-024-31060-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-024-31060-3

Keywords