Abstract
Ribonucleic acid (RNA) sequence controls many biological functions in the body of living organisms including protein synthesis and gene regulations. The sequence folds according to the low Minimum Free Energy (MFE) which enables it to achieve various roles. Given a target RNA structure, an RNA sequence can be designed to fold accordingly in a process called RNA Inverse Folding. In this paper, we present an RNA Inverse Folding model that performs a single-step look-ahead to select optimal RNA base and base pairs to design RNA sequences. Our proposed model (SPRNA) learns to accurately design RNA sequences based on self-improving policy function. The model recorded state-of-the-art results on two test datasets and very competitive results on others.












Similar content being viewed by others
Data availability
All the datasets used in this work are publicly available. All the references for training and testing datasets are provided in Table 1.
References
Akiba T, Sano S, Yanase T, et al. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, p. 2623–2631.
Akutsu T. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Appl Math. 2000;104(1–3):45–62.
Alberts B, Johnson A, Lewis J, et al. Molecular motors. Molecular biology of the cell. 4th ed. New York: Garland Science; 2002.
Anderson JW, Sizikova E, Badugu A, et al. FRNAkenstein: multiple target inverse RNA folding. BMC Bioinformatics. 2012;13:1–12.
Anderson-Lee J, Fisker E, Kosaraju V, et al. Principles for predicting RNA secondary structure design difficulty. J Mol Biol. 2016;428(5):748–57.
Anthony T, Tian Z, Barber D. Thinking fast and slow with deep learning and tree search. Advances in Neural Information Processing Systems 2017;30:5366–5376.
Avihoo A, Churkin A, Barash D. RNAexinv: an extended inverse RNA folding from shape and physical attributes to sequences. BMC Bioinformatics. 2011;12:1–8.
Ba JL, Kiros JR, Hinton GE. Layer normalization. 2016. arXiv preprint arXiv:1607.06450
Bellman R. A Markovian decision process. Journal of Mathematics and Mechanics. 1957;6:679–84.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Bringmann K, Grandoni F, Saha B, et al. Truly subcubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. SIAM J Comput. 2019;48(2):481–512.
Busch A, Backofen R. Info-RNA—a fast approach to inverse RNA folding. Bioinformatics. 2006;22(15):1823–31.
Cazenave T. Nested Monte-Carlo search. In: Twenty-First International Joint Conference on Artificial Intelligence, 2009.
Cazenave T, Fournier T. Monte Carlo inverse folding. In: Monte Carlo Search: First Workshop, MCS 2020, Held in Conjunction with IJCAI 2020, Virtual Event, January 7, 2021, Proceedings 1. Springer; 2021. p. 84–99.
Chua K, Calandra R, McAllister R, et al. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in Neural Information Processing Systems 2018;31:4759–4770.
Churkin A, Retwitzer MD, Reinharz V, et al. Design of RNAs: comparing programs for inverse RNA folding. Brief Bioinformatics 2018;19(2):350–8.
Cleaves HJJ, et al. Watson–Crick pairing. Encyclopedia of astrobiology. 2015. p. 2650.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.
Crick F. Central dogma of molecular biology. Nature. 1970;227(5258):561–3.
Deisenroth M, Rasmussen CE. Pilco: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), 2011, p. 465–72.
Doherty EA, Batey RT, Masquida B, et al. A universal mode of helix packing in RNA. Nat Struct Biol. 2001;8(4):339–43.
Dromi N, Avihoo A, Barash D. Reconstruction of natural RNA sequences from RNA shape, thermodynamic stability, mutational robustness, and linguistic complexity by evolutionary computation. J Biomol Struct Dyn. 2008;26(1):147–61.
Drory Retwitzer M, Reinharz V, Ponty Y, et al. incaRNAfbinv: a web server for the fragment-based design of RNA sequences. Nucleic Acids Res. 2016;44(W1):W308–14.
Eastman P, Shi J, Ramsundar B, et al. Solving the RNA design problem with reinforcement learning. PLoS Comput Biol. 2018;14(6): e1006176.
Esmaili-Taheri A, Ganjtabesh M. ERD: a fast and reliable tool for RNA design including constraints. BMC Bioinform. 2015;16(1):1–11.
Esmaili-Taheri A, Ganjtabesh M, Mohammad-Noori M. Evolutionary solution for the RNA design problem. Bioinformatics. 2014;30(9):1250–8.
Gao JZ, Li LY, Reidys CM. Inverse folding of RNA pseudoknot structures. Algorithms Mol Biol. 2010;5:1–19.
Garcia-Martin JA, Clote P, Dotu I. RNAifold: a constraint programming algorithm for RNA inverse folding and molecular design. J Bioinform Comput Biol. 2013;11(02):1350001.
Garcia-Martin JA, Dotu I, Clote P. RNAifold 2.0: a web server and software to design custom and RFAM-based RNA molecules. Nucleic Acids Res. 2015;43(W1):W513–21.
Griffiths-Jones S, Bateman A, Marshall M, et al. Rfam: an RNA family database. Nucleic Acids Res. 2003;31(1):439–41.
Hamada M, Kiryu H, Sato K, et al. Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics. 2009;25(4):465–73.
Hamming RW. Error detecting and error correcting codes. Bell Syst Tech J. 1950;29(2):147–60.
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, p. 770–8.
Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst. 1998;6(02):107–16.
Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31(13):3429–31.
Hofacker IL, Fontana W, Stadler PF, et al. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie/Chem Mon. 1994;125(2):167–88.
Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2(5):359–66.
Ieong S, Kao MY, Lam TW, et al. Predicting RNA secondary structures with arbitrary pseudoknots by maximizing the number of stacking pairs. J Comput Biol. 2003;10(6):981–95.
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. PMLR; 2015. p. 448–56.
Isaacs FJ, Dwyer DJ, Ding C, et al. Engineered riboregulators enable post-transcriptional control of gene expression. Nat Biotechnol. 2004;22(7):841–7.
Jain S, Tao Y, Schlick T. Inverse folding with RNA-as-graphs produces a large pool of candidate sequences with target topologies. J Struct Biol. 2020;209(3): 107438.
Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. 2016. arXiv preprint arXiv:1609.02907.
Kleinkauf R, Houwaart T, Backofen R, et al. antaRNA-multi-objective inverse folding of pseudoknot RNA using ant-colony optimization. BMC Bioinform. 2015;16(1):1–7.
Kleinkauf R, Mann M, Backofen R. antaRNA: ant colony-based RNA sequence design. Bioinformatics. 2015;31(19):3114–21.
Kocsis L, Szepesvári C. Bandit based Monte-Carlo planning. In: European Conference on Machine Learning. Springer; 2006. p. 282–93.
Levin A, Lis M, Ponty Y, et al. A global sampling approach to designing and reengineering RNA secondary structures. Nucleic Acids Res. 2012;40(20):10041–52.
Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015. arXiv preprint arXiv:1509.02971.
Lorenz R, Bernhart SH, Hönerzu Siederdissen C, et al. ViennaRNA package 2.0. Algorithms Mol Biol. 2011;6:1–14.
Lyngsø RB, Pedersen CN. RNA pseudoknot prediction in energy-based models. J Comput Biol. 2000;7(3–4):409–27.
MacQueen J, et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, 1967, p. 281–97.
Mathews DH, Disney MD, Childs JL, et al. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci. 2004;101(19):7287–92.
Minuesa G, Alsina C, Garcia-Martin JA, et al. MoiRNAifold: a novel tool for complex in silico RNA design. Nucleic Acids Res. 2021;49(9):4934–43.
Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with deep reinforcement learning. 2013. arXiv preprint arXiv:1312:5602
Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. PMLR; 2016. p. 1928–37.
Nussinov R, Jacobson AB. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci. 1980;77(11):6309–13.
Obonyo S, Nicolas J, Owuor D. Designing RNA sequences through self-play. In: IJCCI; 2022. p. 305–12.
Portela F. An unexpectedly effective Monte Carlo technique for the RNA inverse folding problem. BioRxiv. 2018:345587.
Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106.
Reinharz V, Ponty Y, Waldispühl J. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution. Bioinformatics. 2013;29(13):i308–15.
Retwitzer MD, Reinharz V, Churkin A, et al. incaRNAfbinv 2.0: a webserver and software with motif control for fragment-based design of RNAs. Bioinformatics. 2020;36(9):2920–2.
Rish I, et al. An empirical study of the naive bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 2001, p. 41–6.
Rosenblatt F. The perceptron, a perceiving and recognizing automaton project para. Cornell Aeronautical Laboratory; 1957.
Rosin CD. Nested rollout policy adaptation for Monte Carlo tree search. In: IJCAI; 2011. p. 649–4.
Rummery GA, Niranjan M. On-line Q-learning using connectionist systems, vol. 37. Cambridge: University of Cambridge, Department of Engineering; 1994.
Runge F, Stoll D, Falkner S, et al. Learning to design RNA. 2018. arXiv preprint arXiv:1812.11951.
Saha B. Fast & space-efficient approximations of language edit distance and RNA folding: an amnesic dynamic programming approach. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS). IEEE; 2017. p. 295–306.
Schaffner KF. The Watson–Crick model and reductionism. Br J Philos Sci. 1969;20(4):325–48.
Schlichtkrull M, Kipf TN, Bloem P, et al. Modeling relational data with graph convolutional networks. In: The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15. Springer; 2018. p. 593–607.
Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization. In: International Conference on Machine Learning. PMLR; 2015. p. 1889–97.
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. 2017. arXiv preprint arXiv:1707.06347.
Silver D, Huang A, Maddison CJ, et al. Mastering the game of go with deep neural networks and tree search. Nature. 2016;529(7587):484–9.
Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of go without human knowledge. Nature. 2017;550(7676):354–9.
Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.
Sutton RS. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bull. 1991;2(4):160–3.
Sutton RS, Barto AG. Reinforcement learning: an introduction. Cambridge: MIT Press; 2018.
Świechowski M, Godlewski K, Sawicki B, et al. Monte Carlo tree search: a review of recent modifications and applications. Artif Intell Rev. 2023;56(3):2497–562.
Taneda A. Multi-objective optimization for RNA design with multiple target secondary structures. BMC Bioinform. 2015;16(1):1–20.
Trotta E. On the normalization of the minimum free energy of RNAs by sequence length. PLoS ONE. 2014;9(11): e113380.
Vinyals O, Babuschkin I, Czarnecki WM, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature. 2019;575(7782):350–4.
Wang T, Wei JJ, Sabatini DM, et al. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343(6166):80–4.
Watford M, Wu G. Protein. Adv Nutr. 2018;9(5):651–3.
Watkins CJ, Dayan P. Q-learning. Mach Learn. 1992;8:279–92.
Weinbrand L, Avihoo A, Barash D. RNAfbinv: an interactive java application for fragment-based design of RNA sequences. Bioinformatics. 2013;29(22):2938–40.
Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn. 1992;8:229–56.
Wold S, Esbensen K, Geladi P. Principal component analysis. Chemom Intell Lab Syst. 1987;2(1–3):37–52.
Yang X, Yoshizoe K, Taneda A, et al. RNA inverse folding using Monte Carlo tree search. BMC Bioinform. 2017;18(1):1–12.
Zemora G, Waldsich C. RNA folding in living cells. RNA Biol. 2010;7(6):634–41.
Zuker M, Mathews DH, Turner DH. Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In: RNA biochemistry and biotechnology. Berlin: Springer; 1999. p. 11–43.
Acknowledgements
This work was granted access to the HPC/AI resources of IDRIS under the allocation 2023-AD010614071 made by GENCI.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances on Computational Intelligence 2022” guest edited by Joaquim Filipe, Kevin Warwick, Janusz Kacprzyk, Thomas Bäck, Bas van Stein, Christian Wagner, Jonathan Garibaldi, H. K. Lam, Marie Cottrell and Faiyaz Docto.
Appendices
Appendix 1 Value Network Training
The value network was trained according to the following parameters and hyperparameter constraints. These were selected using Optuna [1]. framework (Table 10).
Appendix 2 One-Hot Feature Encoding Results
The experiments according to one-hot encoding did not yield better scores compared to the coding scheme of the results presented in the experiment section of this paper (Table 11).
Appendix 3 Software and Hardware Settings
The SPRNA was trained with PyTorch 1.13 and Python 3.7. All the experiments were run on NVIDIA V100.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Obonyo, S., Jouandeau, N. & Owuor, D. Self-Playing RNA Inverse Folding. SN COMPUT. SCI. 5, 403 (2024). https://doi.org/10.1007/s42979-024-02659-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02659-x