Abstract
RNA structures possess multiple levels of structural organization. Secondary structures are made of canonical (i.e. Watson-Crick and Wobble) helices, connected by loops whose local conformations are critical determinants of global 3D architectures. Such local 3D structures consist of conserved sets of non-canonical base pairs, called RNA modules. Their prediction from sequence data is thus a milestone toward 3D structure modelling. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in modules databases. Here, we introduce BayesPairing 2, a new sequence search algorithm leveraging secondary structure tree decomposition which allows to reduce the computational complexity and improve predictions on new sequences. We benchmarked our methods on 75 modules and 6380 RNA sequences, and report accuracies that are comparable to the state of the art, with considerable running time improvements. When identifying 200 modules on a single sequence, BayesPairing 2 is over 100 times faster than its previous version, opening new doors for genome-wide applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ankan, A., Panda, A.: pgmpy: Probabilistic graphical models using python. In: Proceedings of the 14th Python in Science Conference (SCIPY 2015). Citeseer (2015)
Argaman, L., Altuvia, S.: fhlA repression by OxyS RNA: kissing complex formation at two sites results in a stable antisense-target RNA complex. J. Mol. Biol. 300(5), 1101–1112 (2000)
Bach, F.R., Jordan, M.I.: Thin junction trees. In: Advances in Neural Information Processing Systems, pp. 569–576 (2002)
Beelen, R.H., Fluitsma, D.M., van der Meer, J.W., Hoefsmit, E.C.: Development of different peroxidatic activity patterns in pertoneal macrophages in vivo and in vitro. J. Reticuloendothel Soc. 25(5), 513–523 (1979)
Berman, H.M., et al.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000). https://doi.org/10.1093/nar/28.1.235
Bodlaender, H.L.: Dynamic programming on graphs with bounded treewidth. In: Lepistö, T., Salomaa, A. (eds.) ICALP 1988. LNCS, vol. 317, pp. 105–118. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-19488-6_110
Chojnowski, G., Walen, T., Bujnicki, J.M.: RNA bricks - a database of RNA 3D motifs and their interactions. Nucleic Acids Res. 42, D123–D131 (2014). https://doi.org/10.1093/nar/gkt1084. Database issue
Cruz, J.A., Westhof, E.: Sequence-based identification of 3D structural modules in RNA with RMDetect. Nat. Methods 8(6), 513–521 (2011). https://doi.org/10.1038/nmeth.1603
Ding, Y., Lawrence, C.E.: A statistical sampling algorithm for rna secondary structure prediction. Nucleic Acids Res. 31, 7280–7301 (2003). https://doi.org/10.1093/nar/gkg938
Djelloul, M., Denise, A.: Automated motif extraction and classification in RNA tertiary structures. RNA 14(12), 2489–2497 (2008). https://doi.org/10.1261/rna.1061108
Du, Z., Lind, K.E., James, T.L.: Structure of TAR RNA complexed with a Tat-TAR interaction nanomolar inhibitor that was identified by computational screening. Chem. Biol. 9(6), 707–712 (2002)
Ge, P., Islam, S., Zhong, C., Zhang, S.: De novo discovery of structural motifs in RNA 3D structures through clustering. Nucleic Acids Res. 46(9), 4783–4793 (2018). https://doi.org/10.1093/nar/gky139
Huck, L., et al.: Conserved tertiary base pairing ensures proper RNA folding and efficient assembly of the signal recognition particle Alu domain. Nucleic Acids Res. 32(16), 4915–4924 (2004)
Kalvari, I., et al.: Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46(D1), D335–D342 (2017). https://doi.org/10.1093/nar/gkx1038
Lancaster, L., Lambert, N.J., Maklan, E.J., Horan, L.H., Noller, H.F.: The sarcin-ricin loop of 23S rRNA is essential for assembly of the functional core of the 50S ribosomal subunit. RNA 14(10), 1999–2012 (2008)
Leontis, N.B., Westhof, E.: Geometric nomenclature and classification of RNA base pairs. RNA 7(4), 499–512 (2001)
Leontis, N.B., Westhof, E.: Geometric nomenclature and classification of RNA base pairs. RNA (N.Y., NY) 7, 499–512 (2001). https://doi.org/10.1017/s1355838201002515
Leontis, N.B., Zirbel, C.L.: Nonredundant 3D structure datasets for RNA knowledge extraction and benchmarking. In: Leontis, N., Westhof, E. (eds.) RNA 3D Structure Analysis and Prediction. Nucleic Acids and Molecular Biology, vol. 27, pp. 281–298. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25740-7_13
Lescoute, A., Leontis, N.B., Massire, C., Westhof, E.: Recurrent structural RNA motifs, isostericity matrices and sequence alignments. Nucleic Acids Res. 33, 2395–2409 (2005). https://doi.org/10.1093/nar/gki535
Lorenz, R., et al.: ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011). https://doi.org/10.1186/1748-7188-6-26
mabseher: A small but efficient C++ library for computing (customized) tree and hypertree decompositions. https://github.com/mabseher/htd
Mathews, D.H.: RNA secondary structure analysis using RNAstructure. Curr. Protoc. Bioinform. 13, 12.6.1–12.6.14 (2006). https://doi.org/10.1002/0471250953.bi1206s13
McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29, 1105–1119 (1990). https://doi.org/10.1002/bip.360290621
Michálik, J., Touzet, H., Ponty, Y.: Efficient approximations of RNA kinetics landscape using non-redundant sampling. Bioinform. (Oxford, Engl.) 33, i283–i292 (2017). https://doi.org/10.1093/bioinformatics/btx269
Mustoe, A.M., Brooks, C.L., Al-Hashimi, H.M.: Hierarchy of RNA functional dynamics. Annu. Rev. Biochem. 83, 441–466 (2014)
Peselis, A., Serganov, A.: Structural insights into ligand binding and gene expression control by an adenosylcobalamin riboswitch. Nat. Struct. Mol. Biol. 19(11), 1182 (2012)
Petrov, A.I., Zirbel, C.L., Leontis, N.B.: Automated classification of RNA 3D motifs and the RNA 3D motif atlas. RNA 19(10), 1327–1340 (2013). https://doi.org/10.1261/rna.039438.113
Popenda, M., et al.: RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures. BMC Bioinform. 11, 231 (2010). https://doi.org/10.1186/1471-2105-11-231
Reinharz, V., Major, F., Waldispühl, J.: Towards 3D structure prediction of large RNA molecules: an integer programming framework to insert local 3D motifs in RNA secondary structure. Bioinformatics 28(12), i207–i214 (2012). https://doi.org/10.1093/bioinformatics/bts226
Reinharz, V., Soulé, A., Westhof, E., Waldispühl, J., Denise, A.: Mining for recurrent long-range interactions in RNA structures reveals embedded hierarchies in network families. Nucleic Acids Res. 46(8), 3841–3851 (2018)
Rovetta, C., Michálik, J., Lorenz, R., Tanzer, A., Ponty, Y.: Non-redundant sampling and statistical estimators for RNA structural properties at the thermodynamic equilibrium (2019, under review). Preprint: https://hal.inria.fr/hal-02288811
Sarrazin-Gendron, R., Reinharz, V., Oliver, C.G., Moitessier, N., Waldispühl, J.: Automated, customizable and efficient identification of 3D base pair modules with BayesPairing. Nucleic Acids Res. 47, 3321–3332 (2019)
Serganov, A., Nudler, E.: A decade of riboswitches. Cell 152(1–2), 17–24 (2013)
Theis, C., Zirbel, C.L., Zu Siederdissen, C.H., Anthon, C., Hofacker, I.L., Nielsen, H., Gorodkin, J.: RNA 3D modules in genome-wide predictions of RNA 2D structure. PLoS ONE 10(10), e0139900 (2015). https://doi.org/10.1371/journal.pone.0139900
Thiel, B.C., Ochsenreiter, R., Gadekar, V.P., Tanzer, A., Hofacker, I.L.: RNA structure elements conserved between mouse and 59 other vertebrates. Genes (Basel) 9(8), 392 (2018)
Tinoco, I., Bustamante, C.: How RNA folds. J. Mol. Biol. 293(2), 271–281 (1999). https://doi.org/10.1006/jmbi.1999.3001
Turner, D.H., Mathews, D.H.: NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 38, D280–D282 (2010). https://doi.org/10.1093/nar/gkp892
Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)
Xue, C., Li, F., He, T., Liu, G.P., Li, Y., Zhang, X.: Classification of real and pseudo microrna precursors using local structure-sequence features and support vector machine. BMC Bioinform. 6(1), 310 (2005)
Zirbel, C.L., Roll, J., Sweeney, B.A., Petrov, A.I., Pirrung, M., Leontis, N.B.: Identifying novel sequence variants of RNA 3D motifs. Nucleic Acids Res. 43(15), 7504–7520 (2015). https://doi.org/10.1093/nar/gkv651
Acknowledgements
The authors are greatly indebted to Anton Petrov for providing us with alignments between RNA PDB structures and Rfam families, which helped us match 3D modules to sequence alignments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sarrazin-Gendron, R., Yao, HT., Reinharz, V., Oliver, C.G., Ponty, Y., Waldispühl, J. (2020). Stochastic Sampling of Structural Contexts Improves the Scalability and Accuracy of RNA 3D Module Identification. In: Schwartz, R. (eds) Research in Computational Molecular Biology. RECOMB 2020. Lecture Notes in Computer Science(), vol 12074. Springer, Cham. https://doi.org/10.1007/978-3-030-45257-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-45257-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45256-8
Online ISBN: 978-3-030-45257-5
eBook Packages: Computer ScienceComputer Science (R0)