Abstract
RNAs represent a class of programmable biomolecules capable of performing diverse biological functions. Recent studies have developed accurate RNA three-dimensional structure prediction methods, which may enable new RNAs to be designed in a structure-guided manner. Here, we develop a structure-to-sequence deep learning platform for the de novo generative design of RNA aptamers. We show that our approach can design RNA aptamers that are predicted to be structurally similar, yet sequence dissimilar, to known light-up aptamers that fluoresce in the presence of small molecules. We experimentally validate several generated RNA aptamers to have fluorescent activity, show that these aptamers can be optimized for activity in silico, and find that they exhibit a mechanism of fluorescence similar to that of known light-up aptamers. Our results demonstrate how structural predictions can guide the targeted and resource-efficient design of new RNA sequences.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The numerical data supporting the findings of this paper are provided in the Source Data, and the sequences can be generated by running RhoDesign. Source Data for Figs. 1 and 2 are available. Sequences generated from RhoDesign and accompanying data are available as Supplementary Data 1. The training dataset and model checkpoints for RhoDesign are available from Zenodo60. The PDB structure for Mango-III (A10U), 6UP0, is available from the PDB61.
Code availability
RhoDesign is available at https://github.com/ml4bio/RhoDesign and from Zenodo60.
References
Cech, T. R., Zaug, A. J. & Grabowski, P. J. In vitro splicing of the ribosomal RNA precursor of Tetrahymena: involvement of a guanosine nucleotide in the excision of the intervening sequence. Cell 27, 487–496 (1981).
Guerrier-Takada, C., Gardiner, K., Marsh, T., Pace, N. & Altman, S. The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell 35, 849–857 (1983).
Statello, L., Guo, C.-J., Chen, L.-L. & Huarte, M. Gene regulation by long non-coding RNAs and its biological functions. Nat. Rev. Mol. Cell Biol. 22, 96–118 (2021).
Dinger, M. E., Mercer, T. R. & Mattick, J. S. RNAs as extracellular signaling molecules. J. Mol. Endocrinol. 40, 151–159 (2008).
Keefe, A. D., Pai, S. & Ellington, A. Aptamers as therapeutics. Nat. Rev. Drug. Discov. 9, 537–550 (2010).
Tuerk, C., MacDougal, S. & Gold, L. RNA pseudoknots that inhibit human immunodeficiency virus type 1 reverse transcriptase. Proc. Natl. Acad. Sci. USA 89, 6988–6992 (1992).
Pardee, K. et al. Rapid, low-cost detection of Zika virus using programmable biomolecular components. Cell 165, 1255–1266 (2016).
Angenent-Mari, N. M., Garruss, A. S., Soenksen, L. R., Church, G. & Collins, J. J. A deep learning approach to programmable RNA switches. Nat. Commun. 11, 5057 (2020).
Valeri, J. A. et al. Sequence-to-function deep learning frameworks for engineered riboregulators. Nat. Commun. 11, 5058 (2020).
Takahashi, M. K. et al. A low-cost paper-based synthetic biology platform for analyzing gut microbiota and host biomarkers. Nat. Commun. 9, 3347 (2018).
Green, A. A., Silver, P. A., Collins, J. J. & Yin, P. Toehold switches: de-novo-designed regulators of gene expression. Cell 159, 925–939 (2014).
Paige, J. S., Wu, K. Y. & Jaffrey, S. R. RNA mimics of green fluorescent protein. Science 333, 642–646 (2011).
Miao, Z. & Westhof, E. RNA structure: advances and assessment of 3D structure prediction. Annu. Rev. Biophys. 46, 483–503 (2017).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Shen, T. et al. E2Efold-3D: end-to-end deep learning method for accurate de novo RNA 3D structure prediction. Preprint at https://arxiv.org/abs/2207.01586 (2022).
Wang, W. et al. trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nat. Commun. 14, 7266 (2023).
Pearce, R., Li, Y., Omenn, G. S. & Zhang, Y. Fast and accurate ab initio protein structure prediction using deep learning potentials. PLoS Comput. Biol. 18, e1010539 (2022).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Das, R. et al. Assessment of three-dimensional RNA structure prediction in CASP15. Proteins 91, 1747–1770 (2023).
Runge, F., Stoll, D., Falkner, S. & Hutter, F. Learning to design RNA. In International Conference on Learning Representations 2019 https://openreview.net/pdf?id=ByfyHh05tQ (ICLR, 2019).
Wu, M. J., Andreasson, J. O. L., Kladwang, W., Greenleaf, W. & Das, R. Automated design of diverse stand-alone riboswitches. ACS Synth. Biol. 8, 1838–1846 (2019).
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Jing, B. et al. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations https://openreview.net/pdf?id=1YLJDvSx6J4 (ICLR, 2021).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 5998–6008 (NIPS, 2017).
Hsu, C. et al. Learning inverse folding from millions of predicted structures. Proc. Mach. Learn. Res. 162, 8946–8970 (2022).
Yang, X., Yoshizoe, K., Taneda, A. & Tsuda, K. RNA inverse folding using Monte Carlo tree search. BMC Bioinform. 18, 468 (2017).
Joshi, C. K. & Liò, P. gRNAde: a geometric deep learning for 3D RNA inverse design. Methods Mol. Biol. 2847, 121–135 (2025).
Tan, C. et al. RDesign: hierarchical data-efficient representation learning for tertiary structure-based RNA design. In The Twelfth International Conference on Learning Representations (ICLR, 2024).
Rubio-Largo, Á., Lozano-García, N., Granado-Criado, J. & Vega-Rodríguez, M. A. Solving the RNA inverse folding problem through target structure decomposition and multiobjective evolutionary computation. Appl. Soft Comput. 147, 110779 (2023).
Autour, A. et al. Fluorogenic RNA Mango aptamers for imaging small non-coding RNAs in mammalian cells. Nat. Commun. 9, 656 (2018).
Jeng, S. C. Y. et al. Fluorogenic aptamers resolve the flexibility of RNA junctions using orientation-dependent FRET. RNA 27, 433–444 (2021).
Iwano, N. et al. Generative aptamer discovery using RaptGen. Nat. Comput. Sci. 2, 378–386 (2022).
Jiang, P. et al. MPBind: a meta-motif-based statistical framework and pipeline to predict binding potential of SELEX-derived aptamers. Bioinformatics 30, 2665–2667 (2014).
Jeng, S. C., Chan, H. H., Booy, E. P., McKenna, S. A. & Unrau, P. J. Fluorophore ligand binding and complex stabilization of the RNA Mango and RNA Spinach aptamers. RNA 22, 1884–1892 (2016).
Trachman, R. J. III et al. Structural basis for high-affinity fluorophore binding and activation by RNA Mango. Nat. Chem. Biol. 13, 807–813 (2017).
Liu, L. Y., Ma, T. Z., Zeng, Y. L., Liu, W. & Mao, Z. W. Structural basis of pyridostatin and its derivatives specifically binding to G-quadruplexes. J. Am. Chem. Soc. 144, 11878–11887 (2022).
Han, F. X., Wheelhouse, R. T. & Hurley, L. H. Interactions of TMPyP4 and TMPyP2 with quadruplex DNA. Structural basis for the differential effects on telomerase inhibition. J. Am. Chem. Soc. 121, 3561–3570 (1999).
Rocca, R. et al. Molecular recognition of a carboxy pyridostatin toward G-quadruplex structures: why does it prefer RNA? Chem. Biol. Drug Des. 90, 919–925 (2017).
Chen, X. C. et al. Tracking the dynamic folding and unfolding of RNA G-quadruplexes in live cells. Angew. Chem. Int. Ed. Engl. 57, 4702–4706 (2018).
Ellington, A. D. & Szostak, J. W. In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818–822 (1990).
Lu, X. J., Bussemaker, H. J. & Olson, W. K. DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res. 43, e142 (2015).
The RNAcentral Consortium. RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res. 47, D221–D229 (2019).
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682 (2010).
Zhang, C., Shine, M., Pyle, A. M. & Zhang, Y. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat. Methods 19, 1109–1115 (2022).
Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).
Boniecki, M. J. et al. SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction. Nucleic Acids Res. 44, e63 (2016).
Li, Y. et al. Integrating end-to-end learning with deep geometrical potentials for ab initio RNA structure prediction. Nat. Commun. 14, 5745 (2023).
Biesiada, M. et al. Automated RNA 3D structure prediction with RNAComposer. Methods Mol. Biol. 1490, 199–215 (2016).
Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).
Case, D. A. et al. AmberTools. J. Chem. Inf. Model. 63, 6183–6191 (2023).
Zok, T. et al. RNApdbee 2.0: multifunctional tool for RNA structure annotation. Nucleic Acids Res. 46, W30–W35 (2018).
Fu, L. et al. UFold: fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res. 50, e14 (2022).
Lorenz, R. et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Sato, K., Akiyama, M. & Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat. Commun. 12, 941 (2021).
Chen, J. et al. Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. Preprint at https://arxiv.org/abs/2204.00300 (2022).
Wong, F. et al. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).
Trachman, R. J. III et al. Structure and functional reselection of the Mango-III fluorogenic RNA aptamer. Nat. Chem. Biol. 15, 472–479 (2019).
Wong, F. et al. Supporting code for: Deep generative design of RNA aptamers using structural predictions. Zenodo https://doi.org/10.5281/zenodo.13892413 (2024).
Trachman, R. J. & Ferre-D'Amare, A. R. Structure of the Mango-III fluorescent aptamer bound to YO3-biotin. Protein Data Bank https://doi.org/10.2210/pdb6UP0/pdb (2019).
Acknowledgements
This work was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under award K25AI168451 (to F.W.), the Swiss National Science Foundation under grant number SNSF_ 203071 (to A.K.), the National Science Foundation Graduate Research Fellowship (to A.Z.W.), the Research Grants Council of the Hong Kong Special Administrative Region, China (projects CUHK 14222922 and RGC GRF 2151185 to I.K. and project CUHK 24204023 to Y.L.), a grant from the Innovation and Technology Commission of the Hong Kong Special Administrative Region, China (projects GHP/065/21SZ, IDBF24ENG06 and ITS/247/23FP to Y.L.), the National Key R&D Program of China (project 2022ZD0160101 to Y.L.) and the Broad Institute of MIT and Harvard (to J.J.C.). This work is part of the Antibiotics-AI Project, which is directed by J.J.C. and supported by the Audacious Project, Flu Lab, LLC, the Sea Grape Foundation, R. Zander and H. Wyss for the Wyss Foundation, and an anonymous donor. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
F.W. conceived research, performed or directed all experiments, wrote the paper and supervised research. D.H. and L.H. developed RhoDesign and performed computational analyses, with contributions from J.W., Z.H., Q.Y. and I.K. A.K. and A.Z.W. conceived research and performed experiments and analyses. S.O. and A.L. performed experiments. J.R., W.J., T.Z., K.I. and J.X.C. performed analyses. S.Z. conceived research and performed analyses. Y.L. conceived research, performed or directed all analyses and supervised research. J.J.C. conceived and supervised research. All authors assisted with manuscript editing.
Corresponding authors
Ethics declarations
Competing interests
J.J.C. is the founding scientific advisory board chair of Integrated Biosciences. F.W. is a co-founder of Integrated Biosciences. S.O. has an equity interest in Integrated Biosciences. The other authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks Jianyi Yang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Jie Pan, in collaboration with the Nature Computational Science team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Structural fidelity of RhoFold’s Mango-III prediction, conserved sequence motifs in Mango aptamers, and comparison to aptamers 1-4.
a, (Left) RhoFold-predicted 3D structure of Mango-III, aligned to the ground truth structure for Mango-III in PDB 6UP0. (Right) AlphaFold 3-predicted 3D structure of Mango-III, aligned to the ground truth structure for Mango-III in PDB 6UP0. b, Comparison of aptamers 1-4’s sequences against Mango sequences. Here, conserved sequence motifs in Mango aptamers are indicated in red.
Extended Data Fig. 2 AlphaFold 3-predicted 3D and RhoFold-predicted secondary structures.
a, Predicted 3D structures for aptamers 1-4 generated using AlphaFold 3, as detailed in the Methods—RNA 3D structure prediction. RMSD, TM-score, and pLDDT values for each structure as compared to the ground truth structure for Mango-III in PDB 6UP0 are shown. b, Secondary structures for Mango-III and aptamers 1-4, as generated based on the corresponding PDB structure (6UP0; Mango-III) or RhoFold predictions (aptamers 1-4), as detailed in the Methods—RNA secondary structure prediction.
Supplementary information
Supplementary Information
Supplementary Figs. 1 and 2 and Tables 1–3.
Supplementary Data 1
Generated and tested RNA sequences, in addition to model predictions of fluorescence activity.
Source data
Source Data Figs. 1 and 2
Statistical source data for Figs. 1 and 2.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wong, F., He, D., Krishnan, A. et al. Deep generative design of RNA aptamers using structural predictions. Nat Comput Sci 4, 829–839 (2024). https://doi.org/10.1038/s43588-024-00720-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-024-00720-6