Skip to main content
Log in

RNA Structural Homology Search with a Succinct Stochastic Grammar Model

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

An increasing number of structural homology search tools, mostly based on profile stochastic context-free grammars (SCFGs) have been recently developed for the non-coding RNA gene identification. SCFGs can include statistical biases that often occur in RNA sequences, necessary to profile specific RNA structures for structural homology search. In this paper, a succinct stochastic grammar model is introduced for RNA that has competitive search effectiveness. More importantly, the profiling model can be easily extended to include pseudoknots, structures that are beyond the capability of profile SCFGs. In addition, the model allows heuristics to be exploited, resulting in a significant speed-up for the CYK algorithm-based search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sakakibara Y, Brown M, Hughey R et al. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research, 1994, 22: 5112–5120.

    Google Scholar 

  2. Eddy S R, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Research, 1994, 22: 2079–2088.

    Google Scholar 

  3. Tinico I, Borer P N, Dengler B et al. Improved estimation of secondary structure in ribonucleic acids. Nature New Biology, 1973, 246: 40–41.

    Google Scholar 

  4. Lowe T M, Eddy S R. tRNAscan-SE: A program for improved detection of transfer RNA genes in genes in genomic sequences. Nucleic Acids Research, 1997, 25: 955–964.

    Article  Google Scholar 

  5. Klein R J, Eddy S R. Rsearch: Finding homologs of single structured RNA sequences. BMC Bioinformatics, 2003, 4(1): 44.

    Article  Google Scholar 

  6. Rivas E, Eddy S R. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics, 2001, 2(8).

  7. Rivas E, Klein R J, Jones T A, Eddy S R. Computational identification of non-coding RNAs in E. coli by comparative genomics. Curr. Biol., 2001, 1(1): 1369–1373.

    Google Scholar 

  8. Rivas E, Eddy S R. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics, 2000, 16: 583–605.

    Google Scholar 

  9. Eddy S R. Non-coding RNA genes and the modern RNA world. Nature Genetics, 2001, 2: 919–929.

    Google Scholar 

  10. Dowell R D, Eddy S R. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics, 2004, 5(1): 71.

    Article  Google Scholar 

  11. Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics, 1999, 15: 446–454.

    Google Scholar 

  12. Durbin R, Eddy S R, Krogh A, Mitchison G J. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.

  13. Weinberg Z, Ruzzo W L. Faster genome annotation of non-coding RNA families without loss of accuracy. In Proc. the Eighth Annual Int. Conf. Research in Computational Molecular Biology, 2004, 243–251.

  14. Brown M, Wilson C. RNA pseudoknot modeling using intersections of stochastic context-free grammars with applications to database search. In Pacific Symposium on Biocomputing, 1996.

  15. Felden B, Massire C, Westhof E et al. Phylogenetic analysis of tmRNA genes within a bacterial subgroup reveals a specific structural signature. Nucleic Acids Research, 2001, 29: 1602–1607.

    Article  Google Scholar 

  16. Brown M P. Small subunit ribosomal RNA modeling using stochastic context-free grammars. In Proc. Int. Conf. Intelligent Systems in Molecular Biology, 2000, 8: 57–66.

  17. Holmes I, Rubin D H. Pairwise RNA structure comparison with stochastic context-free grammars. In Pacific Symposium on Biocomputing, 2002, pp.191–203.

  18. Cai L, Malmberg R L, Wu Y. Stochastic modeling of RNA pseudoknotted structures: A grammatical approach. In Proceedings of the 11th Intelligent Systems for Molecular Biology, also Bioinformatics, 2003, 19: 66–73.

  19. Zeenko V V, Ryabova L A, Spirin A S et al. Eukaryotic elongation factor 1A interacts with the upstream pseudoknot domain in the 3′ untranslated region of tobacco mosaic virus RNA. Journal of Virology, 2002, 76(11): 5678–5691.

    Article  Google Scholar 

  20. Griffiths-Jones S, Bateman A, Marshall M et al. Rfam: An RNA family database. Nucleic Acids Research, 2003, 31(1): 439–441.

    Article  Google Scholar 

  21. Lyngso R B, Pedersen C N S. RNA pseudoknot prediction in energy based models. Journal of Computational Biology, 2000, 7: 409–428.

    Article  Google Scholar 

  22. Sprinzl M, Horn C, Brown M et al. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Research, 1998, 26(1): 148–153.

    Article  Google Scholar 

  23. Tanaka Y, Hori T, Tagaya M et al. Imino proton NMR analysis of HDV ribozymes: Nested double pseudoknot structure and Mg2+ion-binding site close to the catalytic core in solution. Nucleic Acids Research, 2002, 30: 766–774.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying-Lei Song.

Additional information

Ying-Lei Song received his B.S. degree in physics from Tsinghua University in 1998, and his M.S. degree in computer science from Ohio University in 2003. He is at present a Ph.D. candidate in the Department of computer science at University of Georgia. His research interests concentrate on designing efficient algorithms for predicting and studying secondary and tertairy structures of RNAs and proteins.

Ji-Zhen Zhao received his M.S. degree in Biology from Peking University in 1997. He is currently a Ph.D. candidate in the Department of computer science at University of Georgia. His research interests focus on modeling of RNA secondary structures and biological networks.

Chun-Mei Liu received his B.E. and M.E. degrees in Computer science and engineering from Anhui University in 1999 and 2002 respectivehex ly. She is currently a Ph.D. candidate in the Department of computer science at University of Georgia. Her research interests include secondary tertairy structures of RNAs and proteins, graph theory, and theory of computation.

Kan Liurecived his B.S. degree in engineering machanics from Beijing Institute of technology, Chaina, in 1995 and his M.S. degree in computers cience form the University of Georgia in 2004. He is a Ph.D. candidate in the University of California, Riverside. His research focuses on developing efficient algorithems and softwarefor computational problems in molecular biologyand genomics.

Ressell L. Malmberg is a professor in the Plant Biology Department, Univesity of Georgia, USA. He recived his Ph.D. degreefrom the University of Wisconsin in Genetics, then did postdoctoral work at Michigan State University and Cold Spring Harbor Laboratory, before moving to the University of Georgia. His Current research interests are in bioinformatics and in evolutionary genetics.

Li-Ming Cai is an associate professor in the Department of Computer Science at University of Georgia. He received his Ph.D. degree in computer science from Texas A & M University in 1994. He also holds B.s and M.s degrees in computer science awarded by Tsinghua University. His current research interestsinclude algorithems, computational biology, and theory of computation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, YL., Zhao, JZ., Liu, CM. et al. RNA Structural Homology Search with a Succinct Stochastic Grammar Model. J Comput Sci Technol 20, 454–464 (2005). https://doi.org/10.1007/s11390-005-0454-x

Download citation

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-005-0454-x

Navigation