Skip to main content

Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Abstract

Heuristic sequence alignment and database search algorithms, such as PatternHunter and BLAST, are based on the initial discovery of so-called alignment seeds of well-conserved alignment patterns, which are subsequently extended to full local alignments. In recent years, the theory of classical seeds (matching contiguous q-grams) has been extended to spaced seeds, which allow mismatches within a seed, and subsequently to indel seeds, which allow gaps in the underlying alignment.

Different seeds within a given class of seeds are usually compared by their sensitivity, that is, the probability to match an alignment generated from a particular probabilistic alignment model.

We present a flexible, exact, unifying framework called probabilistic arithmetic automaton for seed sensitivity computation that includes all previous results on spaced and indel seeds. In addition, we can easily incorporate sets of arbitrary seeds. Instead of only computing the probability of at least one hit (the standard definition of sensitivity), we can optionally provide the entire distribution of overlapping or non-overlapping seed hits, which yields a different characterization of a seed. A symbolic representation allows fast computation for any set of parameters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)

    Article  Google Scholar 

  2. Altschul, S.F., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol 215, 403–410 (1990)

    Google Scholar 

  3. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  4. Kent, W.J.: BLAT–the blast-like alignment tool. Genome Res. 12(4), 656–664 (2002)

    MathSciNet  Google Scholar 

  5. Gelfand, Y., Rodriguez, A., Benson, G.: TRDB–the tandem repeats database. Nucleic Acids Res. 35 (2007)

    Google Scholar 

  6. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  7. Ma, B., Tromp, J., Li, M.: Patternhunter - faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  8. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th annual international conference on Research in computational molecular biology, pp. 67–75 (2003)

    Google Scholar 

  9. Brejová, B., Brown, D.G., Vinar, T.: Optimal spaced seeds for homologous coding regions. J. Bioinform. Comput. Biol. 1(4), 595–610 (2004)

    Article  Google Scholar 

  10. Choi, K.P., Zeng, F., Zhang, L.: Good spaced seeds for homology search. Bioinformatics 20(7), 1053–1059 (2004)

    Article  Google Scholar 

  11. Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. J. Bioinform. Comput. Biol. 4(2), 553–569 (2006)

    Article  Google Scholar 

  12. Brejová, B., Brown, D.G., Vinar, T.: Vector seeds: an extension to spaced seeds. J. Computer System Sci. 70(3), 364–380 (2005)

    Article  MATH  Google Scholar 

  13. Mak, D., Gelfand, Y., Benson, G.: Indel seeds for homology search. Bioinformatics 22(14), e341–e349 (2006)

    Article  Google Scholar 

  14. Choi, K.P., Zhang, L.: Sensitivity analysis and efficient method for identifying optimal spaced seeds. J. Computer System Sci. 68, 22–40 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  15. Li, M., Ma, B., Zhang, L.: Superiority and complexity of the spaced seeds. In: Proceedings of SODA 2006, pp. 444–453. SIAM, Philadelphia (2006)

    Chapter  Google Scholar 

  16. Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: Highly sensitive and fast homology search. J. Bioinform. Comput. Biol. 2(3), 417–439 (2004)

    Article  Google Scholar 

  17. Brown, D.G.: Optimizing multiple seeds for protein homology search. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(1), 29–38 (2005)

    Article  Google Scholar 

  18. Kucherov, G., Noé, L., Roytberg, M.: Multiseed lossless filtration. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(1), 51–61 (2005)

    Article  Google Scholar 

  19. Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. J. Comput. Biol. 12(6), 847–861 (2005)

    Article  Google Scholar 

  20. Kong, Y.: Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search. J. Comput. Biol. 14(2), 238–254 (2007)

    Article  MathSciNet  Google Scholar 

  21. Ilie, L., Ilie, S.: Multiple spaced seeds for homology search. Bioinformatics 23(22), 2969–2977 (2007)

    Article  Google Scholar 

  22. Mak, D.Y.F., Benson, G.: All hits all the time: Parameter free calculation of seed sensitivity. In: APBC. Advances in Bioinformatics and Computational Biology, vol. 5, pp. 327–340. Imperial College Press (2007)

    Google Scholar 

  23. Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5, 149 (2004)

    Article  Google Scholar 

  24. Pevzner, P.A., Waterman, M.S.: Multiple filtration and approximate pattern matching. Algorithmica 13(1/2), 135–154 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  25. Marschall, T., Rahmann, S.: Probabilistic arithmetic automata and their application to pattern matching statistics. In: 19th Annual Symposium on Combinatorial Pattern Matching (accepted for publication, 2008)

    Google Scholar 

  26. Hopcroft, J.E.: An n log n algorithm for minimizing states in a finite automaton. Technical report, Stanford, CA, USA (1971)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Herms, I., Rahmann, S. (2008). Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87361-7_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87360-0

  • Online ISBN: 978-3-540-87361-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics