Skip to main content

Procrastination Leads to Efficient Filtration for Local Multiple Alignment

  • Conference paper
Algorithms in Bioinformatics (WABI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4175))

Included in the following conference series:

Abstract

We describe an efficient local multiple alignment filtration heuristic for identification of conserved regions in one or more DNA sequences. The method incorporates several novel ideas: (1) palindromic spaced seed patterns to match both DNA strands simultaneously, (2) seed extension (chaining) in order of decreasing multiplicity, and (3) procrastination when low multiplicity matches are encountered. The resulting local multiple alignments may have nucleotide substitutions and internal gaps as large as w characters in any occurrence of the motif. The algorithm consumes \(\mathcal{O}(wN)\) memory and \(\mathcal{O}(wN \log wN)\) time where N is the sequence length. We score the significance of multiple alignments using entropy-based motif scoring methods. We demonstrate the performance of our filtration method on Alu-repeat rich segments of the human genome and a large set of Hepatitis C virus genomes. The GPL implementation of our algorithm in C++ is called procrastAligner and is freely available from http://gel.ahabs.wisc.edu/procrastination

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  2. Brudno, M., Morgenstern, B.: Fast and sensitive alignment of large genomic sequences. In: Proc IEEE CSB 2002, pp. 138–147 (2002)

    Google Scholar 

  3. Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5 (2004)

    Google Scholar 

  4. Kahveci, T., Ljosa, V., Singh, A.K.: Speeding up whole-genome alignment by indexing frequency vectors. Bioinformatics 20, 2122–2134 (2004)

    Article  Google Scholar 

  5. Choi, P., Zeng, K., Zhang, F.L.: Good spaced seeds for homology search. Bioinformatics 20, 1053–1059 (2004)

    Article  Google Scholar 

  6. Li, M., Ma, B., Zhang, L.: Superiority and complexity of the spaced seeds. In: Proc. SODA 2006, pp. 444–453 (2006)

    Google Scholar 

  7. Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. J. Comput. Biol. 12, 847–861 (2005)

    Article  Google Scholar 

  8. Xu, J., Brown, D.G., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: CPM 2004, pp. 47–58 (2004)

    Google Scholar 

  9. Flannick, J., Batzoglou, S.: Using multiple alignments to improve seeded local alignment algorithms. Nucleic Acids Res. 33, 4563–4577 (2005)

    Article  Google Scholar 

  10. Li, L., Stoeckert, C.J., Roos, D.S.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003)

    Article  Google Scholar 

  11. Jaffe, D.B., Butler, J., Gnerre, S., Mauceli, E., Lindblad-Toh, K., Mesirov, J.P., Zody, M.C., Lander, E.S.: Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res. 13, 91–96 (2003)

    Article  Google Scholar 

  12. Ane, C., Sanderson, M.: Missing the forest for the trees: phylogenetic compression and its implications for inferring complex evolutionary histories. Syst. Biol. 54, I311–I317 (2005)

    Article  Google Scholar 

  13. Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005)

    Google Scholar 

  14. Darling, A.C.E., Mau, B., Blattner, F.R., Perna, N.T.: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14(7), 1394–1403 (2004)

    Article  Google Scholar 

  15. Hohl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18(suppl. 1), S312–S320 (2002)

    Google Scholar 

  16. Treangen, T., Messeguer, X.: M-GCAT: Multiple Genome Comparison and Alignment Tool (submitted, 2006)

    Google Scholar 

  17. Dewey, C.N., Pachter, L.: Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Hum. Mol. Genet. 15(suppl. 1) (2006)

    Google Scholar 

  18. Sammeth, M., Heringa, J.: Global multiple-sequence alignment with repeats. Proteins (2006)

    Google Scholar 

  19. Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14(11), 2336–2346 (2004)

    Article  Google Scholar 

  20. Edgar, R.C., Myers, E.W.: PILER: identification and classification of genomic repeats. Bioinformatics 21(suppl. 1) (2005)

    Google Scholar 

  21. Kurtz, S., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R.: Computation and visualization of degenerate repeats in complete genomes. In: Proc. 8th Intell. Syst. Mol. Biol. ISMB 2000, pp. 228–238 (2000)

    Google Scholar 

  22. Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J.: Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110, 462–467 (2005)

    Article  Google Scholar 

  23. Zhang, Y., Waterman, M.S.: An Eulerian path approach to local multiple alignment for DNA sequences. PNAS 102, 1285–1290 (2005)

    Article  MathSciNet  Google Scholar 

  24. Siddharthan, R., Siggia, E.D., van Nimwegen, E.: PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput. Biol. 1 (2005)

    Google Scholar 

  25. Nagarajan, N., Jones, N., Keich, U.: Computing the P-value of the information content from an alignment of multiple sequences. Bioinformatics 21(suppl. 1) (2005)

    Google Scholar 

  26. Szklarczyk, R., Heringa, J.: Tracking repeats using significance and transitivity. Bioinformatics 20(suppl. 1), 311–317 (2004)

    Article  Google Scholar 

  27. Kuiken, C., Yusim, K., Boykin, L., Richardson, R.: The Los Alamos hepatitis C sequence database. Bioinformatics 21, 379–384 (2005)

    Article  Google Scholar 

  28. Prakash, A., Tompa, M.: Statistics of local multiple alignments. Bioinformatics 21, i344–i350 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Darling, A.E., Treangen, T.J., Zhang, L., Kuiken, C., Messeguer, X., Perna, N.T. (2006). Procrastination Leads to Efficient Filtration for Local Multiple Alignment. In: Bücher, P., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2006. Lecture Notes in Computer Science(), vol 4175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11851561_12

Download citation

  • DOI: https://doi.org/10.1007/11851561_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39583-6

  • Online ISBN: 978-3-540-39584-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics