Abstract
Designing protein sequences towards desired properties is a fundamental goal of protein engineering, with applications in drug discovery and enzymatic engineering. Machine learning-guided directed evolution has shown success in expediting the optimization cycle and reducing experimental burden. However, efficient sampling in the vast design space remains a challenge. To address this, we propose EvoPlay, a self-play reinforcement learning framework based on the single-player version of AlphaZero. In this work, we mutate a single-site residue as an action to optimize protein sequences, analogous to playing pieces on a chessboard. A policy-value neural network reciprocally interacts with look-ahead Monte Carlo tree search to guide the optimization agent with breadth and depth. We extensively evaluate EvoPlay on a suite of in silico directed evolution tasks over full-length sequences or combinatorial sites using functional surrogates. EvoPlay also supports AlphaFold2 as a structural surrogate to design peptide binders with high affinities, validated by binding assays. Moreover, we harness EvoPlay to prospectively engineer luciferase, resulting in the discovery of variants with 7.8-fold bioluminescence improvement beyond wild type. In sum, EvoPlay holds great promise for facilitating protein design to tackle unmet academic, industrial and clinical needs.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Datasets used in all benchmark studies have been published previously. The PAB1 (UniProt P04147) and GFP (UniProt P42212) datasets used for EvoPlay surrogate training have been archived97. The four-site library GB1 (UniProt P19909) and PhoQ (UniProt P23837) datasets have been provided98. Peptide and receptor sequences of the WT for 1ssc, 2cnz, 3r7g and 6seo (PDB) have been referenced99. The MSA and logo used in GLuc design are presented in Supplementary Data 1 and Extended Data Fig. 1. Our in-house GLuc library can be found in Supplementary Data 2, and experimentally validated variants proposed by EvoPlay are presented in Supplementary Data 3. Source data are provided with this paper100.
Code availability
EvoPlay is written in Python (v3.8.10) using the PyTorch (v1.12.1) library. The source code is available on GitHub (https://github.com/melobio/EvoPlay) under the GPLv3 license. The doi of the GitHub repository for EvoPlay is provided with the Zenodo link101. The Code Ocean capsule for EvoPlay is also published102.
Change history
08 August 2023
A Correction to this paper has been published: https://doi.org/10.1038/s42256-023-00713-6
References
Romero, P. A. & Arnold, F. H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 10, 866–876 (2009).
Wu, Z., Kan, S. J., Lewis, R. D., Wittmann, B. J. & Arnold, F. H. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc. Natl Acad. Sci. USA 116, 8852–8858 (2019).
Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).
Luo, Y. et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat. Commun. 12, 5743–5756 (2021).
Greenhalgh, J. C., Fahlberg, S. A., Pfleger, B. F. & Romero, P. A. Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production. Nat. Commun. 12, 5825–5834 (2021).
Wittmann, B. J., Yue, Y. & Arnold, F. H. Informed training set design enables efficient machine learning-assisted directed protein evolution. Cell Syst. 12, 1026–1045 (2021).
Hie, B. L. & Yang, K. K. Adaptive machine learning for protein engineering. Curr. Opin. Struct. Biol. 72, 145–152 (2022).
Qiu, Y., Hu, J. & Wei, G.-W. Cluster learning-assisted directed evolution. Nat. Comput. Sci. 1, 809–818 (2021).
Kawashima, S. & Kanehisa, M. AAindex: amino acid index database. Nucleic Acids Res. 28, 374 (2000).
Ofer, D. & Linial, M. ProFET: feature engineering captures high-level protein functions. Bioinformatics 31, 3429–3436 (2015).
Georgiev, A. G. Interpretable numerical descriptors of amino acid space. J. Comput. Biol. 16, 703–723 (2009).
Elnaggar, A. et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2022).
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
Rao, R. M. et al. MSA Transformer. Proc. Mach. Learning Res. 139, 8844–8856 (2021).
Sinai, S. et al. AdaLead: a simple and robust adaptive greedy search algorithm for sequence design. Preprint at https://arxiv.org/abs/2010.02141 (2020).
Biswas, S., Khimulya, G., Alley, E. C., Esvelt, K. M. & Church, G. M. Low-N protein engineering with data-efficient deep learning. Nat. Methods 18, 389–396 (2021).
Ren, Z. et al. Proximal exploration for model-guided protein sequence design. Proc. Mach. Learning Res. 162, 18520–18536 (2022).
Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).
Zeming, L. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Verkuil, R. et al. Language models generalize beyond natural proteins. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521521 (2022).
Hie, B. et al. A high-level programming language for generative protein design. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521526 (2022).
González, J. et al. Batch Bayesian optimization via local penalization. Proc. Mach. Learning Res. 51, 648–657 (2016).
Hie, B., Bryson, B. D. & Berger, B. Leveraging uncertainty in machine learning accelerates biological discovery and design. Cell Syst. 11, 461–477 (2020).
Williams, C. K. & Rasmussen, C. E. Gaussian Processes for Machine Learning (MIT Press, 2006).
Romero, P. A., Krause, A. & Arnold, F. H. Navigating the protein fitness landscape with Gaussian processes. Proc. Natl Acad. Sci. USA 110, E193–E201 (2013).
Bryant, D. H. et al. Deep diversification of an AAV capsid protein by machine learning. Nat. Biotechnol. 39, 691–696 (2021).
Brookes, D. H. & Listgarten, J. Design by adaptive sampling. Preprint at https://arxiv.org/abs/1810.03714 (2018).
Brookes, D., Park, H. & Listgarten, J. Conditioning by adaptive sampling for robust design. Proc. Mach. Learning Res. 97, 773–782 (2019).
Castro, E. et al. Transformer-based protein generation with regularized latent space optimization. Nat. Mach. Intell. 4, 840–851 (2022).
Browne, C. B. et al. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–43 (2012).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).
Mirhoseini, A. et al. A graph placement methodology for fast chip design. Nature 594, 207–212 (2021).
Degrave, J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).
Shree Sowndarya, S. V. et al. Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries. Nat. Mach. Intell. 4, 720–730 (2022).
Fawzi, A. et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610, 47–53 (2022).
Feng, S. et al. Dense reinforcement learning for safety validation of autonomous vehicles. Nature 615, 620–627 (2023).
Angermueller, C. et al. Model-based reinforcement learning for biological sequence design. In International Conference on Learning Representations (eds A. Rush), April 1–23 (ICLR, 2020).
Isaac, I. D. et al. Top-down design of protein architectures with reinforcement learning. Science 380, 266–273 (2023).
Nakatsu, T. et al. Structural basis for the spectral difference in luciferase bioluminescence. Nature 440, 372–376 (2006).
Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).
Melamed, D., Young, D. L., Gamble, C. E., Miller, C. R. & Fields, S. Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA 19, 1537–1551 (2013).
Jain, M. et al. Biological sequence design with GFlowNets. Proc. Mach. Learning Res. 162, 9786–9801 (2022).
Rao, R. et al. Evaluating protein transfer learning with TAPE. Adv. Neural Inf. Process Syst. 32, 9689–9701 (2019).
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft Actor-Critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proc. Mach. Learning Res. 80, 1861–1870 (2018).
Shanehsazzadeh, A., Belanger, D. & Dohan, D. Is transfer learning necessary for protein landscape prediction? Preprint at https://arxiv.org/abs/2011.03443 (2020).
Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).
Illig, A.-M., Siedhoff, N. E., Schwaneberg, U. & Davari, M. D. A hybrid model combining evolutionary probability and machine learning leverages data-driven protein engineering. Preprint at bioRxiv https://doi.org/10.1101/2022.06.07.495081 (2022).
Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. In Advances in Neural Information Processing Systems 34 (eds M. Ranzato), 29287–29303 (NeurIPS, 2021).
Linding, R. et al. Protein disorder prediction: implications for structural proteomics. Structure 11, 1453–1459 (2003).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).
Tsaban, T. et al. Harnessing protein folding neural networks for peptide–protein docking. Nat. Commun. 13, 176 (2022).
Jendrusch, M., Korbel, J. O. & Sadiq, S. K. AlphaDesign: a de novo protein design framework based on AlphaFold. Preprint at bioRxiv https://doi.org/10.1101/2021.10.11.463937 (2021).
Wicky, B. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).
Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).
Bennett, N. R. et al. Improving de novo protein binder design with deep learning. Nat. Commun. 14, 2625–2633 (2023).
Bryant, P. & Elofsson, A. EvoBind: in silico directed evolution of peptide binders with AlphaFold. Preprint at bioRxiv https://doi.org/10.1101/2022.07.23.501214 (2022).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).
Miller, B. R. III et al. MMPBSA.py: an efficient program for end-state free energy calculations. J. Chem. Theory Comput. 8, 3314–3321 (2012).
Hopf, T. A. et al. The EVcouplings Python framework for coevolutionary sequence analysis. Bioinformatics 35, 1582–1584 (2019).
Welsh, J. P., Patel, K. G., Manthiram, K. & Swartz, J. R. Multiply mutated Gaussia luciferases provide prolonged and intense bioluminescence. Biochem. Biophys. Res. Commun. 389, 563–568 (2009).
Kim, S. B., Suzuki, H., Sato, M. & Tao, H. Superluminescent variants of marine luciferases for bioassays. Anal. Chem. 83, 8732–8740 (2011).
Zhang, C., Zheng, W., Mortuza, S., Li, Y. & Zhang, Y. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 36, 2105–2112 (2020).
Wu, N. et al. Solution structure of Gaussia luciferase with five disulfide bonds and identification of a putative coelenterazine binding cavity by heteronuclear NMR. Sci. Rep. 10, 20069 (2020).
Lu, H. et al. Machine learning-aided engineering of hydrolases for PET depolymerization. Nature 604, 662–667 (2022).
Norn, C. et al. Protein sequence design by conformational landscape optimization. Proc. Natl Acad. Sci. USA 118, e2017228118 (2021).
Hsu, C. et al. Learning inverse folding from millions of predicted structures. Proc. Mach. Learning Res. 162, 8946–8970 (2022).
Makowski, E. K. et al. Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space. Nat. Commun. 13, 3788 (2022).
Markel, U. et al. Advances in ultrahigh-throughput screening for directed enzyme evolution. Chem. Soc. Rev. 49, 233–262 (2020).
Gérard, A. et al. High-throughput single-cell activity-based screening and sequencing of antibodies using droplet microfluidics. Nat. Biotechnol. 38, 715–721 (2020).
Dörr, M. et al. Fully automatized high‐throughput enzyme library screening using a robotic platform. Appl. Biochem. Biotechnol. 113, 1421–1432 (2016).
Wittmann, B. J. et al. evSeq: cost-effective amplicon sequencing of every variant in a protein library. ACS Synth. Biol. 11, 1313–1324 (2022).
Ingraham, J., Garg, V., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In Advances in Neural Information Processing Systems 32 (eds H. Wallach et al.) 15820–15831 (NeurIPS, 2019).
Wang, J. et al. Scaffolding protein functional sites using deep learning. Science 377, 387–394 (2022).
Bell, E. L. et al. Biocatalysis. Nat. Rev. Methods Primers 1, 45 (2021).
Hie, B. L. et al. Efficient evolution of human antibodies from general protein language models and sequence information alone. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01763-2 (2023).
The PyMOL Molecular Graphics System v.1.2 r3pre (Schrödinger, 2011).
Huang, X., Pearce, R. & Zhang, Y. EvoEF2: accurate and fast energy function for computational protein design. Bioinformatics 36, 1135–1142 (2020).
Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinform. 20, 473 (2019).
Podgornaia, A. I. & Laub, M. T. Pervasive degeneracy and epistasis in a protein–protein interface. Science 347, 673–677 (2015).
Bergstra, J., Yamins, D. & Cox, D. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. Proc. Mach. Learning Res. 28, 115–123 (2013).
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).
Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).
Schneider, T. D. & Stephens, R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).
Morris, G. et al. AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785–2791 (2009).
Van Der Spoel, D. et al. GROMACS: fast, flexible, and free. J. Comput. Chem. 26, 1701–1718 (2005).
Lindorff‐Larsen, K. et al. Improved side‐chain torsion potentials for the Amber ff99SB protein force field. Proteins 78, 1950–1958 (2010).
Lu, T. Sobtop v.1.0 (dev3.1) http://sobereva.com/soft/Sobtop (2022).
Neese, F. Software update: the ORCA program system—Version 5.0. Wiley Interdiscip. Rev. Comput. Mol. Sci. 12, e1606 (2022).
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
Darden, T., York, D. & Pedersen, L. Particle mesh Ewald: an N·log(N) method for Ewald sums in large systems. J. Chem. Phys. 98, 10089–10092 (1993).
Bussi, G., Donadio, D. & Parrinello, M. Canonical sampling through velocity rescaling. J. Chem. Phys. 126, 014101 (2007).
Parrinello, M. & Rahman, A. Polymorphic transitions in single crystals: a new molecular dynamics method. J. Appl. Phys. 52, 7182–7190 (1981).
Huang, L. GFP & PAB1 training data of EvoPlay Figshare https://doi.org/10.6084/m9.figshare.23498195 (2023).
Huang, L GB1 & PhoQ data of EvoPlay Figshare https://doi.org/10.6084/m9.figshare.21767369.v3 (2023).
Huang, L. Peptide and receptor sequences of the wild type for 1ssc, 2cnz, 3r7g and 6seo Figshare https://doi.org/10.6084/m9.figshare.23375666.v1 (2023).
Huang, L. EvoPlay Figs. 2–5 Source Data Figshare https://doi.org/10.6084/m9.figshare.23437295.v1 (2023).
melobio. (2023). melobio/EvoPlay: v1.0.0 (v1.0.0) Zenodo https://doi.org/10.5281/zenodo.8059425 (2023).
Meng, Y. Self-play reinforcement learning guides protein engineering Code Ocean https://doi.org/10.24433/CO.1846781.v2 (2023).
Acknowledgements
This research is supported by the Ministry of Science and Technology of the People’s Republic of China’s programme titled ‘National Key Research and Development Program of China’ (2022YFF1202200, 2022YFF1202203) and the Science, Technology and Innovation Commission of Shenzhen Municipality under grant JSGGZD20220822095802006.
Author information
Authors and Affiliations
Contributions
M.Y. conceived the problem and designed all detailed studies. Y.W., H.T., L.H. and L.Y. performed analysis. L.P. conducted luciferase validation experiments. F.M. and H.Y. provided strategic guidance. M.Y. wrote the manuscript and Y.W. made modifications.
Corresponding authors
Ethics declarations
Competing interests
29 newly designed and validated GLuc mutants are undergoing patent filing (PCT/CN2023/087445). F.M. declares stock holdings in MGI. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Christian Dallago, Dominik Niopek and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Logo indicating the frequencies of amino acids from 30 MSAs of GLuc.
The sequence logo represents each column of the MSA alignment as a stack of letters, where the height of each letter is proportional to its frequency, and the overall height of each stack is proportional to the entropy of the letter distribution.
Supplementary information
Supplementary Information
Supplementary Notes, Figs. 1–15 and Tables 1–19.
Supplementary Data 1
30-sequence MSA including the WT GLuc sequence, obtained from the online DeepMSA v1 pipeline.
Supplementary Data 2
Starting library containing 164 variants constructed from random in-house mutant libraries—L2, L3 and L4. This constructed library was used for EvoPlay surrogate training and was the starting sequence pool for EvoPlay optimization.
Supplementary Data 3
Top predicted variants, validated by bioluminescence assay, resulting from ten independent runs of EvoPlay.
Source data
Source Data Fig. 2
Source data for Fig. 2a–d.
Source Data Fig. 3
Source data for Fig. 3a–g,i.
Source Data Fig. 4
Source data for Fig. 4b–d.
Source Data Fig. 5
Source data for Fig. 5b,c.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Tang, H., Huang, L. et al. Self-play reinforcement learning guides protein engineering. Nat Mach Intell 5, 845–860 (2023). https://doi.org/10.1038/s42256-023-00691-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-023-00691-9