Rich Parameterization Improves RNA Structure Prediction

Zakov, Shay; Goldberg, Yoav; Elhadad, Michael; Ziv-Ukelson, Michal

doi:10.1007/978-3-642-20036-6_48

Shay Zakov²¹,
Yoav Goldberg²¹,
Michael Elhadad²¹ &
…
Michal Ziv-Ukelson²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6577))

Included in the following conference series:

International Conference on Research in Computational Molecular Biology

1269 Accesses
10 Citations

Abstract

Motivation. Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, to machine- learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant and all models to date have relatively few parameters. We propose a move to much richer parameterizations.

Contribution. We study the potential contribution of increasing the amount of information utilized by folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. We argue that with suitable learning techniques, not being tied to features whose weights could be determined experimentally, and having a large enough set of examples, one could define much richer feature representations than was previously explored, while still allowing efficient inference. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms.

Results. In order to test our assumption, we conducted a set of experiments that asses the prediction quality of the proposed models. These experiments reproduce the settings that were applied in recent thorough work that compared prediction qualities of several state-of-the-art RNA folding prediction algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F ₁-measure over correctly-predicted base-pairs (i.e. 16% error rate), compared to the previously best reported score of 70% (i.e. 30% error rate). That is, the new model yields an error reduction of about 50%.

Availability. Additional supporting material, trained models, and source code are available through our website at http://www.cs.bgu.ac.il/~negevcb/contextfold

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Eddy, S.R.: Non–coding RNA genes and the modern RNA world. Nature Reviews Genetics 2, 919–929 (2001)
Article Google Scholar
Mandal, M., Breaker, R.R.: Gene regulation by riboswitches. Cell 6, 451–463 (2004)
Google Scholar
Washietl, S., Hofacker, I.L., Lukasser, M., Huttenhofer, A., Stadler, P.F.: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nature Biotechnology 23, 1383–1390 (2005)
Article Google Scholar
Kloc, M., Zearfoss, N.R., Etkin, L.D.: Mechanisms of subcellular mRNA localization. Cell 108, 533–544 (2002)
Article Google Scholar
Hofacker, I.L., Stadler, P.F., Stocsits, R.R.: Conserved RNA secondary structures in viral genomes: a survey. Bioinformatics 20, 1495 (2004)
Article Google Scholar
Mattick, J.S.: RNA regulation: a new genetics? Pharmacogenomics J. 4, 9–16 (2004)
Google Scholar
Hofacker, I.L., Fontana, W., Stadler, P.F., Schuster, P.: Vienna RNA package (2002), World Wide Web: http://www.tbi.univie.ac.at/ivo/RNA
Zuker, M.: Computer prediction of RNA structure. Methods in Enzymology 180, 262–288 (1989)
Article MATH Google Scholar
Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 3406–3415 (2003)
Google Scholar
Nussinov, R., Jacobson, A.B.: Fast algorithm for predicting the secondary structure of single-stranded RNA. PNAS 77, 6309–6313 (1980)
Article Google Scholar
Mathews, D.H., Sabina, J., Zuker, M., Turner, D.H.: Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288, 911–940 (1999)
Article Google Scholar
Mathews, D.H., Burkard, M.E., Freier, S.M., Wyatt, J.R., Turner, D.H.: Predicting oligonucleotide affinity to nucleic acid target. RNA 5, 1458 (1999)
Article Google Scholar
Tinoco, I., Uhlenbeck, O.C., Levine, M.D.: Estimation of secondary structure in ribonucleic acids. Nature 230, 362–367 (1971)
Article Google Scholar
Tinoco, I., Borer, P.N., Dengler, B., Levine, M.D., Uhlenbeck, O.C., Crothers, D.M., Gralla, J.: Improved estimation of secondary structure in ribonucleic acids. Nature New Biology 246, 40–41 (1973)
Article Google Scholar
Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Research 33, D121 (2005)
Article Google Scholar
Eddy, S.R., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Research 22, 2079 (1994)
Article Google Scholar
Dowell, R.D., Eddy, S.R.: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5, 71 (2004)
Article Google Scholar
Do, C.B., Woods, D.A., Batzoglou, S.: CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22, e90–e98 (2006)
Article Google Scholar
Andronescu, M., Condon, A., Hoos, H.H., Mathews, D.H., Murphy, K.P.: Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23, i19 (2007)
Article Google Scholar
Do, C.B., Foo, C.S., Ng, A.Y.: Efficient multiple hyperparameter learning for log-linear models. In: Neural Information Processing Systems, vol. 21, Citeseer (2007)
Google Scholar
Andronescu, M.: Computational approaches for RNA energy parameter estimation. PhD thesis, University of British Columbia, Vancouver, Canada (2008)
Google Scholar
Darty, K., Denise, A., Ponty, Y.: VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics 25, 1974–1975 (2009)
Article Google Scholar
Andronescu, M., Bereg, V., Hoos, H.H., Condon, A.: RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics 9, 340 (2008)
Article Google Scholar
Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 1–8. Association for Computational Linguistics (2002)
Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. The Journal of Machine Learning Research 7, 585 (2006)
MathSciNet MATH Google Scholar
Wexler, Y., Zilberstein, C., Ziv-Ukelson, M.: A study of accessible motifs and RNA folding complexity. Journal of Computational Biology 14, 856–872 (2007)
Article MathSciNet MATH Google Scholar
Backofen, R., Tsur, D., Zakov, S., Ziv-Ukelson, M.: Sparse RNA folding: Time and space efficient algorithms. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 249–262. Springer, Heidelberg (2009)
Chapter Google Scholar
Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research 9, 133–148 (1981)
Article Google Scholar
Chiang, D., Knight, K., Wang, W.: 11,001 new features for statistical machine translation. In: Proceedings of HLT-NAACL 2009, Boulder, Colorado, pp. 218–226. Association for Computational Linguistics (2009)
Google Scholar
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of ACL 2009 (2005)
Google Scholar
Watanabe, Y., Asahara, M., Matsumoto, Y.: A structured model for joint learning of argument roles and predicate senses. In: Proceedings of the ACL 2010 Conference Short Papers, Uppsala, Sweden, pp. 98–102. Association for Computational Linguistics (2010)
Google Scholar
Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. Machine Learning 37, 277–296 (1999)
Article MATH Google Scholar
Ziv-Ukelson, M., Gat-Viks, I., Wexler, Y., Shamir, R.: A faster algorithm for RNA co-folding. Algorithms in Bioinformatics, 174–185 (2008)
Google Scholar
Salari, R., Möhl, M., Will, S., Sahinalp, S., Backofen, R.: Time and space efficient RNA-RNA interaction prediction via sparse folding. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 473–490. Springer, Heidelberg (2010)
Chapter Google Scholar
Möhl, M., Salari, R., Will, S., Backofen, R., Sahinalp, S.: Sparsification of RNA Structure Prediction Including Pseudoknots. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 40–51. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Ben-Gurion University of the Negev, Israel
Shay Zakov, Yoav Goldberg, Michael Elhadad & Michal Ziv-Ukelson

Authors

Shay Zakov
View author publications
You can also search for this author in PubMed Google Scholar
Yoav Goldberg
View author publications
You can also search for this author in PubMed Google Scholar
Michael Elhadad
View author publications
You can also search for this author in PubMed Google Scholar
Michal Ziv-Ukelson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EBU3b, University of California San Diego, #4218, 9500 Gilman Drive, 92093-0404, La Jolla, CA, USA
Vineet Bafna
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
S. Cenk Sahinalp

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zakov, S., Goldberg, Y., Elhadad, M., Ziv-Ukelson, M. (2011). Rich Parameterization Improves RNA Structure Prediction. In: Bafna, V., Sahinalp, S.C. (eds) Research in Computational Molecular Biology. RECOMB 2011. Lecture Notes in Computer Science(), vol 6577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20036-6_48

Download citation

DOI: https://doi.org/10.1007/978-3-642-20036-6_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20035-9
Online ISBN: 978-3-642-20036-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics