Skip to main content

Stochastic Sampling of Structural Contexts Improves the Scalability and Accuracy of RNA 3D Module Identification

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2020)

Abstract

RNA structures possess multiple levels of structural organization. Secondary structures are made of canonical (i.e. Watson-Crick and Wobble) helices, connected by loops whose local conformations are critical determinants of global 3D architectures. Such local 3D structures consist of conserved sets of non-canonical base pairs, called RNA modules. Their prediction from sequence data is thus a milestone toward 3D structure modelling. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in modules databases. Here, we introduce BayesPairing 2, a new sequence search algorithm leveraging secondary structure tree decomposition which allows to reduce the computational complexity and improve predictions on new sequences. We benchmarked our methods on 75 modules and 6380 RNA sequences, and report accuracies that are comparable to the state of the art, with considerable running time improvements. When identifying 200 modules on a single sequence, BayesPairing 2 is over 100 times faster than its previous version, opening new doors for genome-wide applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ankan, A., Panda, A.: pgmpy: Probabilistic graphical models using python. In: Proceedings of the 14th Python in Science Conference (SCIPY 2015). Citeseer (2015)

    Google Scholar 

  2. Argaman, L., Altuvia, S.: fhlA repression by OxyS RNA: kissing complex formation at two sites results in a stable antisense-target RNA complex. J. Mol. Biol. 300(5), 1101–1112 (2000)

    Article  Google Scholar 

  3. Bach, F.R., Jordan, M.I.: Thin junction trees. In: Advances in Neural Information Processing Systems, pp. 569–576 (2002)

    Google Scholar 

  4. Beelen, R.H., Fluitsma, D.M., van der Meer, J.W., Hoefsmit, E.C.: Development of different peroxidatic activity patterns in pertoneal macrophages in vivo and in vitro. J. Reticuloendothel Soc. 25(5), 513–523 (1979)

    Google Scholar 

  5. Berman, H.M., et al.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000). https://doi.org/10.1093/nar/28.1.235

    Article  Google Scholar 

  6. Bodlaender, H.L.: Dynamic programming on graphs with bounded treewidth. In: Lepistö, T., Salomaa, A. (eds.) ICALP 1988. LNCS, vol. 317, pp. 105–118. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-19488-6_110

    Chapter  Google Scholar 

  7. Chojnowski, G., Walen, T., Bujnicki, J.M.: RNA bricks - a database of RNA 3D motifs and their interactions. Nucleic Acids Res. 42, D123–D131 (2014). https://doi.org/10.1093/nar/gkt1084. Database issue

    Article  Google Scholar 

  8. Cruz, J.A., Westhof, E.: Sequence-based identification of 3D structural modules in RNA with RMDetect. Nat. Methods 8(6), 513–521 (2011). https://doi.org/10.1038/nmeth.1603

    Article  Google Scholar 

  9. Ding, Y., Lawrence, C.E.: A statistical sampling algorithm for rna secondary structure prediction. Nucleic Acids Res. 31, 7280–7301 (2003). https://doi.org/10.1093/nar/gkg938

    Article  Google Scholar 

  10. Djelloul, M., Denise, A.: Automated motif extraction and classification in RNA tertiary structures. RNA 14(12), 2489–2497 (2008). https://doi.org/10.1261/rna.1061108

    Article  Google Scholar 

  11. Du, Z., Lind, K.E., James, T.L.: Structure of TAR RNA complexed with a Tat-TAR interaction nanomolar inhibitor that was identified by computational screening. Chem. Biol. 9(6), 707–712 (2002)

    Article  Google Scholar 

  12. Ge, P., Islam, S., Zhong, C., Zhang, S.: De novo discovery of structural motifs in RNA 3D structures through clustering. Nucleic Acids Res. 46(9), 4783–4793 (2018). https://doi.org/10.1093/nar/gky139

    Article  Google Scholar 

  13. Huck, L., et al.: Conserved tertiary base pairing ensures proper RNA folding and efficient assembly of the signal recognition particle Alu domain. Nucleic Acids Res. 32(16), 4915–4924 (2004)

    Article  Google Scholar 

  14. Kalvari, I., et al.: Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46(D1), D335–D342 (2017). https://doi.org/10.1093/nar/gkx1038

    Article  Google Scholar 

  15. Lancaster, L., Lambert, N.J., Maklan, E.J., Horan, L.H., Noller, H.F.: The sarcin-ricin loop of 23S rRNA is essential for assembly of the functional core of the 50S ribosomal subunit. RNA 14(10), 1999–2012 (2008)

    Article  Google Scholar 

  16. Leontis, N.B., Westhof, E.: Geometric nomenclature and classification of RNA base pairs. RNA 7(4), 499–512 (2001)

    Article  Google Scholar 

  17. Leontis, N.B., Westhof, E.: Geometric nomenclature and classification of RNA base pairs. RNA (N.Y., NY) 7, 499–512 (2001). https://doi.org/10.1017/s1355838201002515

    Article  Google Scholar 

  18. Leontis, N.B., Zirbel, C.L.: Nonredundant 3D structure datasets for RNA knowledge extraction and benchmarking. In: Leontis, N., Westhof, E. (eds.) RNA 3D Structure Analysis and Prediction. Nucleic Acids and Molecular Biology, vol. 27, pp. 281–298. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25740-7_13

    Chapter  Google Scholar 

  19. Lescoute, A., Leontis, N.B., Massire, C., Westhof, E.: Recurrent structural RNA motifs, isostericity matrices and sequence alignments. Nucleic Acids Res. 33, 2395–2409 (2005). https://doi.org/10.1093/nar/gki535

    Article  Google Scholar 

  20. Lorenz, R., et al.: ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011). https://doi.org/10.1186/1748-7188-6-26

    Article  Google Scholar 

  21. mabseher: A small but efficient C++ library for computing (customized) tree and hypertree decompositions. https://github.com/mabseher/htd

  22. Mathews, D.H.: RNA secondary structure analysis using RNAstructure. Curr. Protoc. Bioinform. 13, 12.6.1–12.6.14 (2006). https://doi.org/10.1002/0471250953.bi1206s13

    Article  Google Scholar 

  23. McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29, 1105–1119 (1990). https://doi.org/10.1002/bip.360290621

    Article  Google Scholar 

  24. Michálik, J., Touzet, H., Ponty, Y.: Efficient approximations of RNA kinetics landscape using non-redundant sampling. Bioinform. (Oxford, Engl.) 33, i283–i292 (2017). https://doi.org/10.1093/bioinformatics/btx269

    Article  Google Scholar 

  25. Mustoe, A.M., Brooks, C.L., Al-Hashimi, H.M.: Hierarchy of RNA functional dynamics. Annu. Rev. Biochem. 83, 441–466 (2014)

    Article  Google Scholar 

  26. Peselis, A., Serganov, A.: Structural insights into ligand binding and gene expression control by an adenosylcobalamin riboswitch. Nat. Struct. Mol. Biol. 19(11), 1182 (2012)

    Article  Google Scholar 

  27. Petrov, A.I., Zirbel, C.L., Leontis, N.B.: Automated classification of RNA 3D motifs and the RNA 3D motif atlas. RNA 19(10), 1327–1340 (2013). https://doi.org/10.1261/rna.039438.113

    Article  Google Scholar 

  28. Popenda, M., et al.: RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures. BMC Bioinform. 11, 231 (2010). https://doi.org/10.1186/1471-2105-11-231

    Article  Google Scholar 

  29. Reinharz, V., Major, F., Waldispühl, J.: Towards 3D structure prediction of large RNA molecules: an integer programming framework to insert local 3D motifs in RNA secondary structure. Bioinformatics 28(12), i207–i214 (2012). https://doi.org/10.1093/bioinformatics/bts226

    Article  Google Scholar 

  30. Reinharz, V., Soulé, A., Westhof, E., Waldispühl, J., Denise, A.: Mining for recurrent long-range interactions in RNA structures reveals embedded hierarchies in network families. Nucleic Acids Res. 46(8), 3841–3851 (2018)

    Article  Google Scholar 

  31. Rovetta, C., Michálik, J., Lorenz, R., Tanzer, A., Ponty, Y.: Non-redundant sampling and statistical estimators for RNA structural properties at the thermodynamic equilibrium (2019, under review). Preprint: https://hal.inria.fr/hal-02288811

  32. Sarrazin-Gendron, R., Reinharz, V., Oliver, C.G., Moitessier, N., Waldispühl, J.: Automated, customizable and efficient identification of 3D base pair modules with BayesPairing. Nucleic Acids Res. 47, 3321–3332 (2019)

    Article  Google Scholar 

  33. Serganov, A., Nudler, E.: A decade of riboswitches. Cell 152(1–2), 17–24 (2013)

    Article  Google Scholar 

  34. Theis, C., Zirbel, C.L., Zu Siederdissen, C.H., Anthon, C., Hofacker, I.L., Nielsen, H., Gorodkin, J.: RNA 3D modules in genome-wide predictions of RNA 2D structure. PLoS ONE 10(10), e0139900 (2015). https://doi.org/10.1371/journal.pone.0139900

    Article  Google Scholar 

  35. Thiel, B.C., Ochsenreiter, R., Gadekar, V.P., Tanzer, A., Hofacker, I.L.: RNA structure elements conserved between mouse and 59 other vertebrates. Genes (Basel) 9(8), 392 (2018)

    Article  Google Scholar 

  36. Tinoco, I., Bustamante, C.: How RNA folds. J. Mol. Biol. 293(2), 271–281 (1999). https://doi.org/10.1006/jmbi.1999.3001

    Article  Google Scholar 

  37. Turner, D.H., Mathews, D.H.: NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 38, D280–D282 (2010). https://doi.org/10.1093/nar/gkp892

    Article  Google Scholar 

  38. Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)

    Article  Google Scholar 

  39. Xue, C., Li, F., He, T., Liu, G.P., Li, Y., Zhang, X.: Classification of real and pseudo microrna precursors using local structure-sequence features and support vector machine. BMC Bioinform. 6(1), 310 (2005)

    Article  Google Scholar 

  40. Zirbel, C.L., Roll, J., Sweeney, B.A., Petrov, A.I., Pirrung, M., Leontis, N.B.: Identifying novel sequence variants of RNA 3D motifs. Nucleic Acids Res. 43(15), 7504–7520 (2015). https://doi.org/10.1093/nar/gkv651

    Article  Google Scholar 

Download references

Acknowledgements

The authors are greatly indebted to Anton Petrov for providing us with alignments between RNA PDB structures and Rfam families, which helped us match 3D modules to sequence alignments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jérôme Waldispühl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarrazin-Gendron, R., Yao, HT., Reinharz, V., Oliver, C.G., Ponty, Y., Waldispühl, J. (2020). Stochastic Sampling of Structural Contexts Improves the Scalability and Accuracy of RNA 3D Module Identification. In: Schwartz, R. (eds) Research in Computational Molecular Biology. RECOMB 2020. Lecture Notes in Computer Science(), vol 12074. Springer, Cham. https://doi.org/10.1007/978-3-030-45257-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45257-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45256-8

  • Online ISBN: 978-3-030-45257-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics