Abstract
Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1), 133–148 (1981)
Hofacker, I.L.: Vienna RNA secondary structure server. Nucleic Acids Res. 31(13), 3429–3431 (2003)
Eddy, S.R., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Res. 22(11), 2079–2088 (1994)
Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjölander, K., Underwood, R.C., Haussler, D.: Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res. 22(23), 5112–5120 (1994)
Knudsen, B., Hein, J.: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6), 446–454 (1999)
Rivas, E., Eddy, S.R.: Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2, 8 (2001)
Eddy, S.R.: A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics 3, 18 (2002)
Sakakibara, Y.: Pair hidden Markov models on tree structures. Bioinformatics 19(suppl. 1), i232–i240 (2003)
Sato, K., Sakakibara, Y.: RNA secondary structural alignment with conditional random fields. Bioinformatics 21(suppl. 2), ii237–ii242 (2005)
Dowell, R.D., Eddy, S.R.: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5, 71 (2004)
Pedersen, J.S., Bejerano, G., Siepel, A., Rosenbloom, K., Lindblad-Toh, K., Lander, E.S., Kent, J., Miller, W., Haussler, D.: Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2(4), e33 (2006)
Do, C.B., Woods, D.A., Batzoglou, S.: CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14), e90–e98 (2006)
Rivas, E., Eddy, S.R.: Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 16(7), 583–605 (2000)
Liang, P., Petrov, S., Jordan, M.I., Klein, D.: The infinite PCFG using hierarchical Dirichlet processes. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 688–697 (2007)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101, 1566–1581 (2006)
Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet process mixtures. Bayesian Analysis 1, 121–144 (2005)
Hamada, M., Kiryu, H., Sato, K., Mituyama, T., Asai, K.: Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics 25(4), 465–473 (2009)
Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33(Database issue), D121–D124 (2005)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289 (2001)
McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29(6-7), 1105–1119 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sato, K., Hamada, M., Mituyama, T., Asai, K., Sakakibara, Y. (2009). A Non-parametric Bayesian Approach for Predicting RNA Secondary Structures. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-04241-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04240-9
Online ISBN: 978-3-642-04241-6
eBook Packages: Computer ScienceComputer Science (R0)