Gapped Extension for Local Multiple Alignment of Interspersed DNA Repeats

Treangen, Todd J.; Darling, Aaron E.; Ragan, Mark A.; Messeguer, Xavier

doi:10.1007/978-3-540-79450-9_8

Todd J. Treangen¹,
Aaron E. Darling²,
Mark A. Ragan² &
…
Xavier Messeguer¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4983))

Included in the following conference series:

International Symposium on Bioinformatics Research and Applications

952 Accesses
1 Citations

Abstract

The identification of homologous DNA is a fundamental building block of comparative genomic and molecular evolution studies. To date, pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with a previously described efficient filtration method for local multiple alignment. During gapped extension, we use the MUSCLE implementation of progressive multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any strand/species-symmetric nucleotide substitution matrix, and we have developed a method to adapt an arbitrary substitution matrix (i.e. HOXD) to organisms with different G+C content. We evaluate the performance of our method and previous approaches on a hybrid dataset of real genomic DNA with simulated interspersed repeats. Our method outperforms existing methods in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in the free, open-source procrastAligner software, available from: http://alggen.lsi.upc.es/recerca/align/ procrastination

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kumar, S., Filipski, A.: Multiple sequence alignment: In pursuit of homologous DNA positions. Genome Res. 17, 127–135 (2007)
Article Google Scholar
Schwartz, S., Kent, J.W., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with blastz. Genome Res. 13, 103–107 (2003)
Article Google Scholar
Pearson, W.R.: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183, 63–98 (1990)
Article Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Article Google Scholar
Blanchette, M., Kent, W., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)
Article Google Scholar
Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14(11), 2336–2346 (2004)
Article Google Scholar
Morgenstern, B., French, K., Dress, A., Werner, T.: DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14, 290–294 (1998)
Article Google Scholar
Zhang, Y., Waterman, M.S.: An Eulerian path approach to local multiple alignment for DNA sequences. PNAS 102, 1285–1290 (2005)
Article MATH MathSciNet Google Scholar
Brudno, M., Do, D.C.B., Cooper, G.M., Kim, M.F., Davydov, E., Program, N.C.S., Green, E.D., Sidow, A., Batzoglou, S.: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic dna. Genome Res. 13, 721–731 (2003)
Article Google Scholar
Szklarczyk, R., Heringa, J.: Aubergene–a sensitive genome alignment tool. Bioinformatics 22, 1431–1436 (2006)
Article Google Scholar
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348 (1994)
Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Article Google Scholar
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)
Article Google Scholar
Edgar, R.: MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32 (2004)
Google Scholar
Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005)
Article Google Scholar
Darling, A.E., Treangen, T.J., Zhang, L., Kuiken, C., Messeguer, X., Perna, N.T.: Procrastination leads to efficient filtration for local multiple alignment. Algorithms in Bioinformatics 4175, 126–137 (2006)
Article MathSciNet Google Scholar
Choi, P.K., Zeng, F., Zhang, L.: Good spaced seeds for homology search. Bioinformatics 20, 1053–1059 (2004)
Article Google Scholar
Szklarczyk, R., Heringa, J.: Tracking repeats using significance and transitivity. Bioinformatics 20 (suppl. 1), 1311–1317 (2004)
Google Scholar
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Kent, W.J.: BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)
Article MathSciNet Google Scholar
Chiaromonte, F., Yap, V.B., Miller, W.: Scoring pairwise genomic sequence alignments. In: Pac Symp. Biocomput., pp. 115–126 (2002)
Google Scholar
Yi-Kuo, Y., Altschul, F.: The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21, 902–911 (2005)
Google Scholar
Lunter, G.: HMMoC a compiler for hidden Markov models. Bioinformatics 23, 2485–2487 (2007)
Article Google Scholar
Rocha, E.P., Blanchard, A.: Genomic repeats, genome plasticity and the dynamics of Mycoplasma evolution. Nucleic Acids Res. 30, 2031–2042 (2002)
Article Google Scholar
Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res 27, 2682–2690 (1999)
Article Google Scholar
Achaz, G., Boyer, F., Rocha, E.P.C., Viari, A., Coissac, E.: Repseek, a tool to retrieve approximate repeats from large dna sequences. Bioinformatics (2006)
Google Scholar
Prakash, A., Tompa, M.: Statistics of local multiple alignments. Bioinformatics 21(suppl. 1) (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Polytechnic University of Catalonia, Barcelona, Spain
Todd J. Treangen & Xavier Messeguer
ARC Centre of Excellence in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
Aaron E. Darling & Mark A. Ragan

Authors

Todd J. Treangen
View author publications
You can also search for this author in PubMed Google Scholar
Aaron E. Darling
View author publications
You can also search for this author in PubMed Google Scholar
Mark A. Ragan
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Messeguer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ion Măndoiu Raj Sunderraman Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Treangen, T.J., Darling, A.E., Ragan, M.A., Messeguer, X. (2008). Gapped Extension for Local Multiple Alignment of Interspersed DNA Repeats. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2008. Lecture Notes in Computer Science(), vol 4983. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79450-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-79450-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79449-3
Online ISBN: 978-3-540-79450-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics