Skip to main content

Gapped Extension for Local Multiple Alignment of Interspersed DNA Repeats

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4983))

Included in the following conference series:

Abstract

The identification of homologous DNA is a fundamental building block of comparative genomic and molecular evolution studies. To date, pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with a previously described efficient filtration method for local multiple alignment. During gapped extension, we use the MUSCLE implementation of progressive multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any strand/species-symmetric nucleotide substitution matrix, and we have developed a method to adapt an arbitrary substitution matrix (i.e. HOXD) to organisms with different G+C content. We evaluate the performance of our method and previous approaches on a hybrid dataset of real genomic DNA with simulated interspersed repeats. Our method outperforms existing methods in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in the free, open-source procrastAligner software, available from: http://alggen.lsi.upc.es/recerca/align/ procrastination

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kumar, S., Filipski, A.: Multiple sequence alignment: In pursuit of homologous DNA positions. Genome Res. 17, 127–135 (2007)

    Article  Google Scholar 

  2. Schwartz, S., Kent, J.W., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with blastz. Genome Res. 13, 103–107 (2003)

    Article  Google Scholar 

  3. Pearson, W.R.: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183, 63–98 (1990)

    Article  Google Scholar 

  4. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  5. Blanchette, M., Kent, W., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)

    Article  Google Scholar 

  6. Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14(11), 2336–2346 (2004)

    Article  Google Scholar 

  7. Morgenstern, B., French, K., Dress, A., Werner, T.: DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14, 290–294 (1998)

    Article  Google Scholar 

  8. Zhang, Y., Waterman, M.S.: An Eulerian path approach to local multiple alignment for DNA sequences. PNAS 102, 1285–1290 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  9. Brudno, M., Do, D.C.B., Cooper, G.M., Kim, M.F., Davydov, E., Program, N.C.S., Green, E.D., Sidow, A., Batzoglou, S.: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic dna. Genome Res. 13, 721–731 (2003)

    Article  Google Scholar 

  10. Szklarczyk, R., Heringa, J.: Aubergene–a sensitive genome alignment tool. Bioinformatics 22, 1431–1436 (2006)

    Article  Google Scholar 

  11. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348 (1994)

    Google Scholar 

  12. Thompson, J.D., Higgins, D.G., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)

    Article  Google Scholar 

  13. Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)

    Article  Google Scholar 

  14. Edgar, R.: MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32 (2004)

    Google Scholar 

  15. Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005)

    Article  Google Scholar 

  16. Darling, A.E., Treangen, T.J., Zhang, L., Kuiken, C., Messeguer, X., Perna, N.T.: Procrastination leads to efficient filtration for local multiple alignment. Algorithms in Bioinformatics 4175, 126–137 (2006)

    Article  MathSciNet  Google Scholar 

  17. Choi, P.K., Zeng, F., Zhang, L.: Good spaced seeds for homology search. Bioinformatics 20, 1053–1059 (2004)

    Article  Google Scholar 

  18. Szklarczyk, R., Heringa, J.: Tracking repeats using significance and transitivity. Bioinformatics 20 (suppl. 1), 1311–1317 (2004)

    Google Scholar 

  19. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  20. Kent, W.J.: BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)

    Article  MathSciNet  Google Scholar 

  21. Chiaromonte, F., Yap, V.B., Miller, W.: Scoring pairwise genomic sequence alignments. In: Pac Symp. Biocomput., pp. 115–126 (2002)

    Google Scholar 

  22. Yi-Kuo, Y., Altschul, F.: The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21, 902–911 (2005)

    Google Scholar 

  23. Lunter, G.: HMMoC a compiler for hidden Markov models. Bioinformatics 23, 2485–2487 (2007)

    Article  Google Scholar 

  24. Rocha, E.P., Blanchard, A.: Genomic repeats, genome plasticity and the dynamics of Mycoplasma evolution. Nucleic Acids Res. 30, 2031–2042 (2002)

    Article  Google Scholar 

  25. Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res 27, 2682–2690 (1999)

    Article  Google Scholar 

  26. Achaz, G., Boyer, F., Rocha, E.P.C., Viari, A., Coissac, E.: Repseek, a tool to retrieve approximate repeats from large dna sequences. Bioinformatics (2006)

    Google Scholar 

  27. Prakash, A., Tompa, M.: Statistics of local multiple alignments. Bioinformatics 21(suppl. 1) (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ion Măndoiu Raj Sunderraman Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Treangen, T.J., Darling, A.E., Ragan, M.A., Messeguer, X. (2008). Gapped Extension for Local Multiple Alignment of Interspersed DNA Repeats. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2008. Lecture Notes in Computer Science(), vol 4983. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79450-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-79450-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-79449-3

  • Online ISBN: 978-3-540-79450-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics