Skip to main content

Extending Stochastic Context-Free Grammars for an Application in Bioinformatics

  • Conference paper
Language and Automata Theory and Applications (LATA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6031))

Abstract

We extend stochastic context-free grammars such that the probability of applying a production can depend on the length of the subword that is generated from the application and show that existing algorithms for training and determining the most probable parse tree can easily be adapted to the extended model without losses in performance. Furthermore we show that the extended model is suited to improve the quality of predictions of RNA secondary structures.

The extended model may also be applied to other fields where SCFGs are used like natural language processing. Additionally some interesting questions in the field of formal languages arise from it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boyle, J., Robillard, G.T., Kim, S.: Sequential folding of transfer RNA. a nuclear magnetic resonance study of successively longer tRNA fragments with a common 5’ end. J. Mol. Biol. 139, 601–625 (1980)

    Article  Google Scholar 

  2. Chi, T., Geman, S.: Estimation of probabilistic context-free grammars. Computational Linguistics 24(2), 299–305 (1998)

    MathSciNet  Google Scholar 

  3. Dowell, R.D., Eddy, S.R.: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5, 71 (2004)

    Article  Google Scholar 

  4. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  5. Furbach, F.: Earley parsing for length dependent grammars. Bachelor thesis, TU Kaiserslautern (2009)

    Google Scholar 

  6. Harrison, M.A.: Introduction to Formal Language Theory. Addison-Wesley, Reading (1978)

    MATH  Google Scholar 

  7. Knudsen, B., Hein, J.: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15, 446–454 (1999)

    Article  Google Scholar 

  8. Meyer, I., Miklos, I.: Co-transcriptional folding is encoded within RNA genes. BMC Molecular Biology 5(1), 10 (2004)

    Article  Google Scholar 

  9. Nebel, M.E.: On a statistical filter for RNA secondary structures. Technical report, Frankfurter Informatik-Berichte (May 2002)

    Google Scholar 

  10. Nussinov, R., Pieczenik, G., Griggs, R., Kleitmann, D.J.: Algorithms for loop matchings. SIAM Journal of Applied Mathematics 35, 68–82 (1978)

    Article  MATH  Google Scholar 

  11. Prescher, D.: A tutorial on the expectation-maximization algorithm including maximum-likelihood estimation and em training of probabilistic context-free grammars (2003), http://staff.science.uva.nl/~prescher/papers/bib/2003em.prescher.pdf

  12. Sprinzl, M., Vassilenko, K.S., Emmerich, J., Bauer, F.: Compilation of tRNA sequences and sequences of tRNA genes (December 20, 1999), http://www.uni-bayreuth.de/departments/biochemie/trna/

  13. Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 165–201 (1995)

    MathSciNet  Google Scholar 

  14. Viennot, G., de Chaumont, M.: Enumeration of RNA Secondary Structures by Complexity. In: Mathematics in Biology and Medicine: Proceedings of an International Conference Held in Bari, Italy, July 18-22, 1983 (1985)

    Google Scholar 

  15. Weinberg, F.: Position-and-length-dependent context-free grammars. In: Theorietag Automaten und Formale Sprachen (2009)

    Google Scholar 

  16. Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9, 133–148 (1981)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Weinberg, F., Nebel, M.E. (2010). Extending Stochastic Context-Free Grammars for an Application in Bioinformatics. In: Dediu, AH., Fernau, H., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2010. Lecture Notes in Computer Science, vol 6031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13089-2_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13089-2_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13088-5

  • Online ISBN: 978-3-642-13089-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics