Skip to main content

Speeding up Parsing of Biological Context-Free Grammars

  • Conference paper
Combinatorial Pattern Matching (CPM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3537))

Included in the following conference series:

Abstract

Grammars have been shown to be a very useful way to model biological sequences families. As both the quantity of biological sequences and the complexity of the biological grammars increase, generic and efficient methods for parsing are needed. We consider two parsers for context-free grammars: depth-first top-down parser and chart parser; we analyse and compare them, both theoretically and empirically, with respect to biological data. The theoretical comparison is based on a common feature of biological grammars: the gap – a gap is an element of the grammars designed to match any subsequence of the parsed string. The empirical comparison is based on grammars and sequences used by the bioinformatics community. Our conclusions are that: (1) the chart parsing algorithm is significantly faster than the depth-first top-down algorithm, (2) designing special treatments in the algorithms for managing gaps is useful, and (3) the way the grammar encodes gaps has to be carefully chosen, when using parsers not optimised for managing gaps, to prevent important increases in running times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chomsky, N.: Three models for the description of language. IRE Trans. on Information Theory 2 (1956)

    Google Scholar 

  2. Searls, D.B.: The linguistics of DNA. American Scientist 80, 579–591 (1992)

    Google Scholar 

  3. Falquet, L., et al.: Protein data bank. Nucleic Acid Research 30, 235–238 (2002)

    Article  Google Scholar 

  4. Pereira, F., Warren, D.H.D.: Definite clause grammars for language analysis – a survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence 13, 231–278 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  5. Searls, D.B.: String variable grammar: A logic grammar formalism for the biological language of DNA. Journal of logic Programming 12 (1993)

    Google Scholar 

  6. Dsouza, M., Larsen, N., Overbeek, R.: Searching for patterns in genomic data. Trends in Genetics 13, 497–498 (1997)

    Article  Google Scholar 

  7. Leung, S.w., Mellish, C., Robertson, D.: Basic Gene Grammars and DNAChartParser for language processing of Escherichia coli promoter DNA sequences. Bioinformatics 17, 226–236 (2001)

    Article  Google Scholar 

  8. Grune, D., Jacobs, C.J.: Parsing techniques – a practical guide. Ellis Horwood, Chichester (1990)

    Google Scholar 

  9. Gazdar, G., Mellish, C.: Natural Language Processing in Prolog. Addison Wesley, Reading (1989)

    Google Scholar 

  10. Aycock, J., Horspool, R.N.: Practical Earley parsing. The Computer Journal 45 (2002)

    Google Scholar 

  11. Jay, E.: An efficient context-free parsing algorithm. Commun. ACM 13, 94–102 (1970)

    Article  Google Scholar 

  12. Apweiler, R., et al.: UniProt: the universal protein knowledgebase. Nucl. Acids Res. 32, D115–D119 (2004)

    Article  Google Scholar 

  13. Pesole, G., Liuni, S.: Internet resources for the functional analysis of 5’ and 3’ untranslated regions of eukaryotic mRNA. Trends in Genetics 15, 378 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fredouille, D., Bryant, C.H. (2005). Speeding up Parsing of Biological Context-Free Grammars. In: Apostolico, A., Crochemore, M., Park, K. (eds) Combinatorial Pattern Matching. CPM 2005. Lecture Notes in Computer Science, vol 3537. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11496656_21

Download citation

  • DOI: https://doi.org/10.1007/11496656_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26201-5

  • Online ISBN: 978-3-540-31562-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics