Self-Playing RNA Inverse Folding

Obonyo, Stephen; Jouandeau, Nicolas; Owuor, Dickson

doi:10.1007/s42979-024-02659-x

Self-Playing RNA Inverse Folding

Original Research
Published: 06 April 2024

Volume 5, article number 403, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

161 Accesses
Explore all metrics

Abstract

Ribonucleic acid (RNA) sequence controls many biological functions in the body of living organisms including protein synthesis and gene regulations. The sequence folds according to the low Minimum Free Energy (MFE) which enables it to achieve various roles. Given a target RNA structure, an RNA sequence can be designed to fold accordingly in a process called RNA Inverse Folding. In this paper, we present an RNA Inverse Folding model that performs a single-step look-ahead to select optimal RNA base and base pairs to design RNA sequences. Our proposed model (SPRNA) learns to accurately design RNA sequences based on self-improving policy function. The model recorded state-of-the-art results on two test datasets and very competitive results on others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 4

Fig. 7

Monte Carlo Inverse RNA Folding

RNA inverse folding using Monte Carlo tree search

Article Open access 06 November 2017

Toward Increasing the Credibility of RNA Design

Data availability

All the datasets used in this work are publicly available. All the references for training and testing datasets are provided in Table 1.

References

Akiba T, Sano S, Yanase T, et al. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, p. 2623–2631.
Akutsu T. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Appl Math. 2000;104(1–3):45–62.
Article MathSciNet Google Scholar
Alberts B, Johnson A, Lewis J, et al. Molecular motors. Molecular biology of the cell. 4th ed. New York: Garland Science; 2002.
Google Scholar
Anderson JW, Sizikova E, Badugu A, et al. FRNAkenstein: multiple target inverse RNA folding. BMC Bioinformatics. 2012;13:1–12.
Google Scholar
Anderson-Lee J, Fisker E, Kosaraju V, et al. Principles for predicting RNA secondary structure design difficulty. J Mol Biol. 2016;428(5):748–57.
Article Google Scholar
Anthony T, Tian Z, Barber D. Thinking fast and slow with deep learning and tree search. Advances in Neural Information Processing Systems 2017;30:5366–5376.
Google Scholar
Avihoo A, Churkin A, Barash D. RNAexinv: an extended inverse RNA folding from shape and physical attributes to sequences. BMC Bioinformatics. 2011;12:1–8.
Article Google Scholar
Ba JL, Kiros JR, Hinton GE. Layer normalization. 2016. arXiv preprint arXiv:1607.06450
Bellman R. A Markovian decision process. Journal of Mathematics and Mechanics. 1957;6:679–84.
MathSciNet Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Article Google Scholar
Bringmann K, Grandoni F, Saha B, et al. Truly subcubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. SIAM J Comput. 2019;48(2):481–512.
Article MathSciNet Google Scholar
Busch A, Backofen R. Info-RNA—a fast approach to inverse RNA folding. Bioinformatics. 2006;22(15):1823–31.
Article Google Scholar
Cazenave T. Nested Monte-Carlo search. In: Twenty-First International Joint Conference on Artificial Intelligence, 2009.
Cazenave T, Fournier T. Monte Carlo inverse folding. In: Monte Carlo Search: First Workshop, MCS 2020, Held in Conjunction with IJCAI 2020, Virtual Event, January 7, 2021, Proceedings 1. Springer; 2021. p. 84–99.
Chua K, Calandra R, McAllister R, et al. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in Neural Information Processing Systems 2018;31:4759–4770.
Google Scholar
Churkin A, Retwitzer MD, Reinharz V, et al. Design of RNAs: comparing programs for inverse RNA folding. Brief Bioinformatics 2018;19(2):350–8.
Google Scholar
Cleaves HJJ, et al. Watson–Crick pairing. Encyclopedia of astrobiology. 2015. p. 2650.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.
Article Google Scholar
Crick F. Central dogma of molecular biology. Nature. 1970;227(5258):561–3.
Article Google Scholar
Deisenroth M, Rasmussen CE. Pilco: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), 2011, p. 465–72.
Doherty EA, Batey RT, Masquida B, et al. A universal mode of helix packing in RNA. Nat Struct Biol. 2001;8(4):339–43.
Article Google Scholar
Dromi N, Avihoo A, Barash D. Reconstruction of natural RNA sequences from RNA shape, thermodynamic stability, mutational robustness, and linguistic complexity by evolutionary computation. J Biomol Struct Dyn. 2008;26(1):147–61.
Article Google Scholar
Drory Retwitzer M, Reinharz V, Ponty Y, et al. incaRNAfbinv: a web server for the fragment-based design of RNA sequences. Nucleic Acids Res. 2016;44(W1):W308–14.
Article Google Scholar
Eastman P, Shi J, Ramsundar B, et al. Solving the RNA design problem with reinforcement learning. PLoS Comput Biol. 2018;14(6): e1006176.
Article Google Scholar
Esmaili-Taheri A, Ganjtabesh M. ERD: a fast and reliable tool for RNA design including constraints. BMC Bioinform. 2015;16(1):1–11.
Article Google Scholar
Esmaili-Taheri A, Ganjtabesh M, Mohammad-Noori M. Evolutionary solution for the RNA design problem. Bioinformatics. 2014;30(9):1250–8.
Article Google Scholar
Gao JZ, Li LY, Reidys CM. Inverse folding of RNA pseudoknot structures. Algorithms Mol Biol. 2010;5:1–19.
Article Google Scholar
Garcia-Martin JA, Clote P, Dotu I. RNAifold: a constraint programming algorithm for RNA inverse folding and molecular design. J Bioinform Comput Biol. 2013;11(02):1350001.
Article Google Scholar
Garcia-Martin JA, Dotu I, Clote P. RNAifold 2.0: a web server and software to design custom and RFAM-based RNA molecules. Nucleic Acids Res. 2015;43(W1):W513–21.
Article Google Scholar
Griffiths-Jones S, Bateman A, Marshall M, et al. Rfam: an RNA family database. Nucleic Acids Res. 2003;31(1):439–41.
Article Google Scholar
Hamada M, Kiryu H, Sato K, et al. Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics. 2009;25(4):465–73.
Article Google Scholar
Hamming RW. Error detecting and error correcting codes. Bell Syst Tech J. 1950;29(2):147–60.
Article MathSciNet Google Scholar
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, p. 770–8.
Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst. 1998;6(02):107–16.
Article Google Scholar
Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31(13):3429–31.
Article Google Scholar
Hofacker IL, Fontana W, Stadler PF, et al. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie/Chem Mon. 1994;125(2):167–88.
Article Google Scholar
Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2(5):359–66.
Article Google Scholar
Ieong S, Kao MY, Lam TW, et al. Predicting RNA secondary structures with arbitrary pseudoknots by maximizing the number of stacking pairs. J Comput Biol. 2003;10(6):981–95.
Article Google Scholar
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. PMLR; 2015. p. 448–56.
Isaacs FJ, Dwyer DJ, Ding C, et al. Engineered riboregulators enable post-transcriptional control of gene expression. Nat Biotechnol. 2004;22(7):841–7.
Article Google Scholar
Jain S, Tao Y, Schlick T. Inverse folding with RNA-as-graphs produces a large pool of candidate sequences with target topologies. J Struct Biol. 2020;209(3): 107438.
Article Google Scholar
Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. 2016. arXiv preprint arXiv:1609.02907.
Kleinkauf R, Houwaart T, Backofen R, et al. antaRNA-multi-objective inverse folding of pseudoknot RNA using ant-colony optimization. BMC Bioinform. 2015;16(1):1–7.
Article Google Scholar
Kleinkauf R, Mann M, Backofen R. antaRNA: ant colony-based RNA sequence design. Bioinformatics. 2015;31(19):3114–21.
Article Google Scholar
Kocsis L, Szepesvári C. Bandit based Monte-Carlo planning. In: European Conference on Machine Learning. Springer; 2006. p. 282–93.
Levin A, Lis M, Ponty Y, et al. A global sampling approach to designing and reengineering RNA secondary structures. Nucleic Acids Res. 2012;40(20):10041–52.
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015. arXiv preprint arXiv:1509.02971.
Lorenz R, Bernhart SH, Hönerzu Siederdissen C, et al. ViennaRNA package 2.0. Algorithms Mol Biol. 2011;6:1–14.
Article Google Scholar
Lyngsø RB, Pedersen CN. RNA pseudoknot prediction in energy-based models. J Comput Biol. 2000;7(3–4):409–27.
Article Google Scholar
MacQueen J, et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, 1967, p. 281–97.
Mathews DH, Disney MD, Childs JL, et al. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci. 2004;101(19):7287–92.
Article Google Scholar
Minuesa G, Alsina C, Garcia-Martin JA, et al. MoiRNAifold: a novel tool for complex in silico RNA design. Nucleic Acids Res. 2021;49(9):4934–43.
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with deep reinforcement learning. 2013. arXiv preprint arXiv:1312:5602
Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. PMLR; 2016. p. 1928–37.
Nussinov R, Jacobson AB. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci. 1980;77(11):6309–13.
Article Google Scholar
Obonyo S, Nicolas J, Owuor D. Designing RNA sequences through self-play. In: IJCCI; 2022. p. 305–12.
Portela F. An unexpectedly effective Monte Carlo technique for the RNA inverse folding problem. BioRxiv. 2018:345587.
Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106.
Article Google Scholar
Reinharz V, Ponty Y, Waldispühl J. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution. Bioinformatics. 2013;29(13):i308–15.
Article Google Scholar
Retwitzer MD, Reinharz V, Churkin A, et al. incaRNAfbinv 2.0: a webserver and software with motif control for fragment-based design of RNAs. Bioinformatics. 2020;36(9):2920–2.
Article Google Scholar
Rish I, et al. An empirical study of the naive bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 2001, p. 41–6.
Rosenblatt F. The perceptron, a perceiving and recognizing automaton project para. Cornell Aeronautical Laboratory; 1957.
Rosin CD. Nested rollout policy adaptation for Monte Carlo tree search. In: IJCAI; 2011. p. 649–4.
Rummery GA, Niranjan M. On-line Q-learning using connectionist systems, vol. 37. Cambridge: University of Cambridge, Department of Engineering; 1994.
Google Scholar
Runge F, Stoll D, Falkner S, et al. Learning to design RNA. 2018. arXiv preprint arXiv:1812.11951.
Saha B. Fast & space-efficient approximations of language edit distance and RNA folding: an amnesic dynamic programming approach. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS). IEEE; 2017. p. 295–306.
Schaffner KF. The Watson–Crick model and reductionism. Br J Philos Sci. 1969;20(4):325–48.
Article Google Scholar
Schlichtkrull M, Kipf TN, Bloem P, et al. Modeling relational data with graph convolutional networks. In: The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15. Springer; 2018. p. 593–607.
Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization. In: International Conference on Machine Learning. PMLR; 2015. p. 1889–97.
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. 2017. arXiv preprint arXiv:1707.06347.
Silver D, Huang A, Maddison CJ, et al. Mastering the game of go with deep neural networks and tree search. Nature. 2016;529(7587):484–9.
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of go without human knowledge. Nature. 2017;550(7676):354–9.
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.
MathSciNet Google Scholar
Sutton RS. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bull. 1991;2(4):160–3.
Article Google Scholar
Sutton RS, Barto AG. Reinforcement learning: an introduction. Cambridge: MIT Press; 2018.
Google Scholar
Świechowski M, Godlewski K, Sawicki B, et al. Monte Carlo tree search: a review of recent modifications and applications. Artif Intell Rev. 2023;56(3):2497–562.
Article Google Scholar
Taneda A. Multi-objective optimization for RNA design with multiple target secondary structures. BMC Bioinform. 2015;16(1):1–20.
Article Google Scholar
Trotta E. On the normalization of the minimum free energy of RNAs by sequence length. PLoS ONE. 2014;9(11): e113380.
Article Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature. 2019;575(7782):350–4.
Article Google Scholar
Wang T, Wei JJ, Sabatini DM, et al. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343(6166):80–4.
Article Google Scholar
Watford M, Wu G. Protein. Adv Nutr. 2018;9(5):651–3.
Article Google Scholar
Watkins CJ, Dayan P. Q-learning. Mach Learn. 1992;8:279–92.
Article Google Scholar
Weinbrand L, Avihoo A, Barash D. RNAfbinv: an interactive java application for fragment-based design of RNA sequences. Bioinformatics. 2013;29(22):2938–40.
Article Google Scholar
Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn. 1992;8:229–56.
Article Google Scholar
Wold S, Esbensen K, Geladi P. Principal component analysis. Chemom Intell Lab Syst. 1987;2(1–3):37–52.
Article Google Scholar
Yang X, Yoshizoe K, Taneda A, et al. RNA inverse folding using Monte Carlo tree search. BMC Bioinform. 2017;18(1):1–12.
Article Google Scholar
Zemora G, Waldsich C. RNA folding in living cells. RNA Biol. 2010;7(6):634–41.
Article Google Scholar
Zuker M, Mathews DH, Turner DH. Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In: RNA biochemistry and biotechnology. Berlin: Springer; 1999. p. 11–43.
Chapter Google Scholar

Download references

Acknowledgements

This work was granted access to the HPC/AI resources of IDRIS under the allocation 2023-AD010614071 made by GENCI.

Author information

Authors and Affiliations

LIASD, University of Paris 8, Saint-Denis, France
Stephen Obonyo & Nicolas Jouandeau
SCES, Strathmore University, Nairobi, Kenya
Dickson Owuor

Authors

Stephen Obonyo
View author publications
You can also search for this author inPubMed Google Scholar
Nicolas Jouandeau
View author publications
You can also search for this author inPubMed Google Scholar
Dickson Owuor
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Stephen Obonyo.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances on Computational Intelligence 2022” guest edited by Joaquim Filipe, Kevin Warwick, Janusz Kacprzyk, Thomas Bäck, Bas van Stein, Christian Wagner, Jonathan Garibaldi, H. K. Lam, Marie Cottrell and Faiyaz Docto.

Appendices

Appendix 1 Value Network Training

The value network was trained according to the following parameters and hyperparameter constraints. These were selected using Optuna [1]. framework (Table 10).

Table 10 Parameter/hyperparameter for the value network

Full size table

Appendix 2 One-Hot Feature Encoding Results

The experiments according to one-hot encoding did not yield better scores compared to the coding scheme of the results presented in the experiment section of this paper (Table 11).

Table 11 SPRNA with one-hot encoding

Full size table

Appendix 3 Software and Hardware Settings

The SPRNA was trained with PyTorch 1.13 and Python 3.7. All the experiments were run on NVIDIA V100.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Obonyo, S., Jouandeau, N. & Owuor, D. Self-Playing RNA Inverse Folding. SN COMPUT. SCI. 5, 403 (2024). https://doi.org/10.1007/s42979-024-02659-x

Download citation

Received: 05 April 2023
Accepted: 26 January 2024
Published: 06 April 2024
DOI: https://doi.org/10.1007/s42979-024-02659-x

Keywords

Part of a collection:

Advances on Computational Intelligence 2022

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-Playing RNA Inverse Folding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Monte Carlo Inverse RNA Folding

RNA inverse folding using Monte Carlo tree search

Toward Increasing the Credibility of RNA Design

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1 Value Network Training

Appendix 2 One-Hot Feature Encoding Results

Appendix 3 Software and Hardware Settings

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now