Skip to main content

Rich Parameterization Improves RNA Structure Prediction

  • Conference paper
Book cover Research in Computational Molecular Biology (RECOMB 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6577))

Abstract

Motivation. Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, to machine- learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant and all models to date have relatively few parameters. We propose a move to much richer parameterizations.

Contribution. We study the potential contribution of increasing the amount of information utilized by folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. We argue that with suitable learning techniques, not being tied to features whose weights could be determined experimentally, and having a large enough set of examples, one could define much richer feature representations than was previously explored, while still allowing efficient inference. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms.

Results. In order to test our assumption, we conducted a set of experiments that asses the prediction quality of the proposed models. These experiments reproduce the settings that were applied in recent thorough work that compared prediction qualities of several state-of-the-art RNA folding prediction algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F 1-measure over correctly-predicted base-pairs (i.e. 16% error rate), compared to the previously best reported score of 70% (i.e. 30% error rate). That is, the new model yields an error reduction of about 50%.

Availability. Additional supporting material, trained models, and source code are available through our website at http://www.cs.bgu.ac.il/~negevcb/contextfold

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Eddy, S.R.: Non–coding RNA genes and the modern RNA world. Nature Reviews Genetics 2, 919–929 (2001)

    Article  Google Scholar 

  2. Mandal, M., Breaker, R.R.: Gene regulation by riboswitches. Cell 6, 451–463 (2004)

    Google Scholar 

  3. Washietl, S., Hofacker, I.L., Lukasser, M., Huttenhofer, A., Stadler, P.F.: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nature Biotechnology 23, 1383–1390 (2005)

    Article  Google Scholar 

  4. Kloc, M., Zearfoss, N.R., Etkin, L.D.: Mechanisms of subcellular mRNA localization. Cell 108, 533–544 (2002)

    Article  Google Scholar 

  5. Hofacker, I.L., Stadler, P.F., Stocsits, R.R.: Conserved RNA secondary structures in viral genomes: a survey. Bioinformatics 20, 1495 (2004)

    Article  Google Scholar 

  6. Mattick, J.S.: RNA regulation: a new genetics? Pharmacogenomics J. 4, 9–16 (2004)

    Google Scholar 

  7. Hofacker, I.L., Fontana, W., Stadler, P.F., Schuster, P.: Vienna RNA package (2002), World Wide Web: http://www.tbi.univie.ac.at/ivo/RNA

  8. Zuker, M.: Computer prediction of RNA structure. Methods in Enzymology 180, 262–288 (1989)

    Article  MATH  Google Scholar 

  9. Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 3406–3415 (2003)

    Google Scholar 

  10. Nussinov, R., Jacobson, A.B.: Fast algorithm for predicting the secondary structure of single-stranded RNA. PNAS 77, 6309–6313 (1980)

    Article  Google Scholar 

  11. Mathews, D.H., Sabina, J., Zuker, M., Turner, D.H.: Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288, 911–940 (1999)

    Article  Google Scholar 

  12. Mathews, D.H., Burkard, M.E., Freier, S.M., Wyatt, J.R., Turner, D.H.: Predicting oligonucleotide affinity to nucleic acid target. RNA 5, 1458 (1999)

    Article  Google Scholar 

  13. Tinoco, I., Uhlenbeck, O.C., Levine, M.D.: Estimation of secondary structure in ribonucleic acids. Nature 230, 362–367 (1971)

    Article  Google Scholar 

  14. Tinoco, I., Borer, P.N., Dengler, B., Levine, M.D., Uhlenbeck, O.C., Crothers, D.M., Gralla, J.: Improved estimation of secondary structure in ribonucleic acids. Nature New Biology 246, 40–41 (1973)

    Article  Google Scholar 

  15. Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Research 33, D121 (2005)

    Article  Google Scholar 

  16. Eddy, S.R., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Research 22, 2079 (1994)

    Article  Google Scholar 

  17. Dowell, R.D., Eddy, S.R.: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5, 71 (2004)

    Article  Google Scholar 

  18. Do, C.B., Woods, D.A., Batzoglou, S.: CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22, e90–e98 (2006)

    Article  Google Scholar 

  19. Andronescu, M., Condon, A., Hoos, H.H., Mathews, D.H., Murphy, K.P.: Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23, i19 (2007)

    Article  Google Scholar 

  20. Do, C.B., Foo, C.S., Ng, A.Y.: Efficient multiple hyperparameter learning for log-linear models. In: Neural Information Processing Systems, vol. 21, Citeseer (2007)

    Google Scholar 

  21. Andronescu, M.: Computational approaches for RNA energy parameter estimation. PhD thesis, University of British Columbia, Vancouver, Canada (2008)

    Google Scholar 

  22. Darty, K., Denise, A., Ponty, Y.: VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics 25, 1974–1975 (2009)

    Article  Google Scholar 

  23. Andronescu, M., Bereg, V., Hoos, H.H., Condon, A.: RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics 9, 340 (2008)

    Article  Google Scholar 

  24. Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 1–8. Association for Computational Linguistics (2002)

    Google Scholar 

  25. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. The Journal of Machine Learning Research 7, 585 (2006)

    MathSciNet  MATH  Google Scholar 

  26. Wexler, Y., Zilberstein, C., Ziv-Ukelson, M.: A study of accessible motifs and RNA folding complexity. Journal of Computational Biology 14, 856–872 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  27. Backofen, R., Tsur, D., Zakov, S., Ziv-Ukelson, M.: Sparse RNA folding: Time and space efficient algorithms. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 249–262. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  28. Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research 9, 133–148 (1981)

    Article  Google Scholar 

  29. Chiang, D., Knight, K., Wang, W.: 11,001 new features for statistical machine translation. In: Proceedings of HLT-NAACL 2009, Boulder, Colorado, pp. 218–226. Association for Computational Linguistics (2009)

    Google Scholar 

  30. McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of ACL 2009 (2005)

    Google Scholar 

  31. Watanabe, Y., Asahara, M., Matsumoto, Y.: A structured model for joint learning of argument roles and predicate senses. In: Proceedings of the ACL 2010 Conference Short Papers, Uppsala, Sweden, pp. 98–102. Association for Computational Linguistics (2010)

    Google Scholar 

  32. Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. Machine Learning 37, 277–296 (1999)

    Article  MATH  Google Scholar 

  33. Ziv-Ukelson, M., Gat-Viks, I., Wexler, Y., Shamir, R.: A faster algorithm for RNA co-folding. Algorithms in Bioinformatics, 174–185 (2008)

    Google Scholar 

  34. Salari, R., Möhl, M., Will, S., Sahinalp, S., Backofen, R.: Time and space efficient RNA-RNA interaction prediction via sparse folding. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 473–490. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  35. Möhl, M., Salari, R., Will, S., Backofen, R., Sahinalp, S.: Sparsification of RNA Structure Prediction Including Pseudoknots. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 40–51. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zakov, S., Goldberg, Y., Elhadad, M., Ziv-Ukelson, M. (2011). Rich Parameterization Improves RNA Structure Prediction. In: Bafna, V., Sahinalp, S.C. (eds) Research in Computational Molecular Biology. RECOMB 2011. Lecture Notes in Computer Science(), vol 6577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20036-6_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20036-6_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20035-9

  • Online ISBN: 978-3-642-20036-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics