A Polynomial Time Solvable Formulation of Multiple Sequence Alignment

Sze, Sing-Hoi; Lu, Yue; Yang, Qingwu

doi:10.1007/11415770_16

A Polynomial Time Solvable Formulation of Multiple Sequence Alignment

Sing-Hoi Sze^25,26,
Yue Lu²⁶ &
Qingwu Yang²⁵

Conference paper

1135 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3500))

Abstract

Since traditional multiple alignment formulations are NP-hard, heuristics are commonly employed to find acceptable alignments with no guaranteed performance bound. This causes a substantial difficulty in understanding what the resulting alignment means and in assessing the quality of these alignments. We propose an alternative formulation of multiple alignment based on the idea of finding a multiple alignment of k sequences which preserves k – 1 pairwise alignments as specified by edges of a given tree. Although it is well known that such a preserving alignment always exists, it did not become a mainstream method for multiple alignment since it seems that a lot of information is lost from ignoring pairwise similarities outside the tree. In contrast, by using pairwise alignments that incorporate consistency information from other sequences, we show that it is possible to obtain very good accuracy with the preserving alignment formulation. We show that a reasonable objective function to use is to find the shortest preserving alignment, and, by a reduction to a graph-theoretic problem, that the problem of finding the shortest preserving multiple alignment can be solved in polynomial time. We demonstrate the success of this approach on three sets of benchmark multiple alignments by using consistency-based pairwise alignments from the first stage of two of the best performing progressive alignment algorithms TCoffee and PROBCONS, and replace the second heuristic progressive step of these algorithms by the exact preserving alignment step (we ignore the iterative refinement step in this study). We apply this strategy to TCoffee and show that the new approach outperforms TCoffee on two of the three test sets. We apply the strategy to a variant of PROBCONS with no iterative refinements and show that the new approach achieves a similar accuracy except on one test set. The most important advantage of the preserving alignment formulation is that we are certain that we can solve the problem in polynomial time without using a heuristic. A software program implementing this approach (PSAlign) is available at http://faculty.cs.tamu.edu/shsze.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altschul, S.F., Lipman, D.J.: Trees, stars, and multiple biological sequence alignment. SIAM J. Appl. Math. 49, 197–209 (1989)
Article MATH MathSciNet Google Scholar
Carillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48, 1073–1082 (1988)
Article MathSciNet Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
MATH Google Scholar
Do, C., Brudno, M., Batzoglou, S.: PROBCONS: probabilistic consistency-based multiple alignment of amino acid sequences. In: Proc. 12th Int. Conf. Intelligent Systems Mol. Biol./3rd European Conf. Comp. Biol, ISMB/ECCB 2004 (2004)
Google Scholar
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)
Article Google Scholar
Feng, D., Doolittle, R.: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, 351–360 (1987)
Article Google Scholar
Gotoh, O.: Consistency of optimal sequence alignments. Bull. Math. Biol. 52, 509–525 (1990)
MATH Google Scholar
Gotoh, O.: Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838 (1996)
Article Google Scholar
Gusfield, D.: Efficient methods for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol. 55, 141–154 (1993)
MATH Google Scholar
Heger, A., Lappe, M., Holm, L.: Accurate detection of very sparse sequence motifs. In: Proc. 7th Ann. Int. Conf. Res. Comp. Mol. Biol (RECOMB 2003), pp. 139–147 (2003)
Google Scholar
Just, W.: Computational complexity of multiple sequence alignment with SP-score. J. Comp. Biol. 8, 615–623 (2001)
Article Google Scholar
Kececioglu, J.D.: The maximum weight trace problem in multiple sequence alignment. LNCS, vol. 684, pp. 106–119. Springer, Heidelberg (1993)
Google Scholar
Knuth, D.E.: The Art of Computer Programming, 3rd edn. Fundamental Algorithms, vol. 1. Addison-Wesley, Reading (1997)
Google Scholar
Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464 (2002)
Article Google Scholar
Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. USA 93, 12098–12103 (1996)
Article MATH Google Scholar
Morrison, D.A.: Phylogenetic tree-building. Int. J. Parasitology 26, 589–617 (1996)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)
Article Google Scholar
Notredame, C., Holm, L., Higgins, D.G.: COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998)
Article Google Scholar
O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D.G., Notredame, C.: 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J. Mol. Biol. 340, 385–395 (2004)
Article Google Scholar
Pevzner, P.A.: Computational Molecular Biology: an Algorithmic Approach. MIT Press, Cambridge (2000)
MATH Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
Stoye, J.: Multiple sequence alignment with the divide-and-conquer method. Gene 211, GC45–GC56 (1998)
Article Google Scholar
Taylor, W.R.: Multiple sequence alignment by a pairwise algorithm. Comp. Appl. Biosci. 3, 81–87 (1987)
Google Scholar
Taylor, W.R.: A flexible method to align large numbers of biological sequences. J. Mol. Evol. 28, 161–169 (1988)
Article Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Article Google Scholar
Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27, 2682–2690 (1999)
Article Google Scholar
Van Walle, I., Lasters, I., Wyns, L.: Align-m — a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20, 1428–1435 (2004)
Article Google Scholar
Vingron, M., Argos, P.: Motif recognition and alignment for many sequences by comparison of dot-matrices. J. Mol. Biol. 218, 33–43 (1991)
Article Google Scholar
Wilcoxon, F.: Probability tables for individual comparisons by ranking methods. Biometrics 3, 119–122 (1947)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Texas A&M University, College Station, TX, 77843, USA
Sing-Hoi Sze & Qingwu Yang
Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, 77843, USA
Sing-Hoi Sze & Yue Lu

Authors

Sing-Hoi Sze
View author publications
You can also search for this author in PubMed Google Scholar
Yue Lu
View author publications
You can also search for this author in PubMed Google Scholar
Qingwu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, 108-8639, Minato-ku, Tokyo, Japan
Satoru Miyano
Broad Institute of MIT and Harvard, 320 Charles Street, 02141-2023, Cambridge, MA, USA
Jill Mesirov
Computational Genomics Laboratory, Department of Bioengineering, Boston University, 44 Cummington St., 02215, Boston, MA, USA
Simon Kasif
Center for Molecular Biology and Computer Sciecne Department, Brown University, 115 Waterman St., 02912, Providence, RI, USA
Sorin Istrail
University of California, San Diego, USA
Pavel A. Pevzner
Department of Molecular and Computational Biology, University of Southern California, 1050 Childs Way, 90089-2910, Los Angeles, CA, USA
Michael Waterman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sze, SH., Lu, Y., Yang, Q. (2005). A Polynomial Time Solvable Formulation of Multiple Sequence Alignment. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_16

Download citation

DOI: https://doi.org/10.1007/11415770_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25866-7
Online ISBN: 978-3-540-31950-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics