Multiple Genome Alignment by Clustering Pairwise Matches

Choi, Jeong-Hyeon; Choi, Kwangmin; Cho, Hwan-Gue; Kim, Sun

doi:10.1007/978-3-540-32290-0_3

Jeong-Hyeon Choi^20,22,
Kwangmin Choi²⁰,
Hwan-Gue Cho²² &
…
Sun Kim^20,21

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3388))

Included in the following conference series:

RECOMB Workshop on Comparative Genomics

Abstract

We have developed a multiple genome alignment algorithm by using a sequence clustering algorithm to combine local pairwisegenome sequence matches produced by pairwise genome alignments, e.g, BLASTZ. Sequence clustering algorithms often generate clusters of sequences such that there exists a common shared region among all sequences in each cluster. To use a sequence clustering algorithm for genome alignment, it is necessary to handle numerous local alignments between a pair of genomes. We propose a multiple genome alignment method that converts the multiple genome alignment problem to the sequence clustering problem. This method does not need to make a guide tree to determine the order of multiple alignment, and it accurately detects multiple homologous regions. As a result, our multiple genome alignment algorithm performs competitively over existing algorithms. This is shown using an experiment which compares the performance of TBA, MultiPipMaker (MPM) and our algorithm in aligning 12 groups of 56 microbial genomes and by evaluating the number of common COGs detected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Large-Scale Sequence Comparison

YOC, A new strategy for pairwise alignment of collinear genomes

Article Open access 02 April 2015

Sequence Alignment

References

Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003)
Article Google Scholar
Pevzner, P., Tesler, G.: Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. Proc. Natl. Acad. Sci. U.S.A. 100, 7672–7677 (2003)
Article Google Scholar
Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular sequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
Lipman, D.J., Altschul, S.F., Kececioglu, J.D.: A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. U.S.A. 86, 4412–4415 (1989)
Article Google Scholar
Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Article Google Scholar
Corpet, F.: Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 16, 10881–10890 (1988)
Article Google Scholar
Gotoh, O.: Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838 (1996)
Article Google Scholar
Notredame, C., Higgins, D.: SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 1515–1524 (1996)
Article Google Scholar
Kim, J., Pramanik, S., Chung, M.: Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci. 10, 419–426 (1994)
Google Scholar
Höhl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18, S312–S320 (2002)
Google Scholar
Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN: Finding local similarities by multiple sequence alignment. Bioinformatics 14, 290–294 (1998)
Article Google Scholar
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E.: LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)
Article Google Scholar
Bray, N., Pachter, L.: MAVID: Constrained ancestral alignment of multiple sequences. Genome Res. 14, 693–699 (2004)
Article Google Scholar
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)
Article Google Scholar
Schwartz, S., Elnitski, L., Li, M., Weirauch, M., Riemer, C., Smit, A., Program, N.C.S., Green, E.D., Hardison, R.C., Miller, W.: MultiPipMaker and supporting tools: alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res. 31, 3518–3524 (2003)
Article Google Scholar
Kim, S.: Graph theoretic sequence clustering algorithms and their applications to genome comparison. In: Wu, C.H., Wang, P., Wang, J.T.L. (eds.) Computational Biology and Genome Informatics. World Scientific, Singapore (2003)
Google Scholar
Kim, S., Gopu, A.: Cluster utility: A new metric to guide sequence clustering. Technical report, School of Informatics, Indiana University (2004)
Google Scholar
Miller, W.: Comparison of genomic DNA sequences: Solved and unsolved problems. Bioinformatics 17, 391–397 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, Indiana University, IN, 47408, USA
Jeong-Hyeon Choi, Kwangmin Choi & Sun Kim
Center for Genomics and Bioinformatics, Indiana University, IN, 47405, USA
Sun Kim
Department of Computer Science and Engineering, Pusan National University, Korea
Jeong-Hyeon Choi & Hwan-Gue Cho

Authors

Jeong-Hyeon Choi
View author publications
You can also search for this author in PubMed Google Scholar
Kwangmin Choi
View author publications
You can also search for this author in PubMed Google Scholar
Hwan-Gue Cho
View author publications
You can also search for this author in PubMed Google Scholar
Sun Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

KTH, Royal Institute of Technology, Stockholm, Sweden
Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, JH., Choi, K., Cho, HG., Kim, S. (2005). Multiple Genome Alignment by Clustering Pairwise Matches. In: Lagergren, J. (eds) Comparative Genomics. RCG 2004. Lecture Notes in Computer Science(), vol 3388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32290-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-32290-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24455-4
Online ISBN: 978-3-540-32290-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics