Skip to main content

The maximum weight trace problem in multiple sequence alignment

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 684))

Abstract

We define a new problem in multiple sequence alignment, called maximum weight trace. The problem formalizes in a natural way the common practice of merging pairwise alignments to form multiple sequence alignments, and contains a version of the minimum sum of pairs alignment problem as a special case.

Informally, the input is a set of pairs of matched characters from the sequences; each pair has an associated weight. The output is a subset of the pairs of maximum total weight that satisfies the following property: there is a multiple alignment that places each pair of characters selected by the subset together in the same column. A set of pairs with this property is called a trace. Intuitively a trace of maximum weight specifies a multiple alignment that agrees as much as possible with the character matches of the input.

We develop a branch and bound algorithm for maximum weight trace. Though the problem is NP-complete, an implementation of the algorithm shows we can solve instances on as many as 6 sequences of length 250 in a few minutes. These are among the largest instances that have been solved to optimality to date for any formulation of multiple sequence alignment.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, Stephen F. and David J. Lipman. Trees, stars, and multiple biological sequence alignment. SIAM Journal on Applied Mathematics 49:1, 197–209, 1989.

    Google Scholar 

  2. Carrillo, Humberto and David Lipman. The multiple sequence alignment problem in biology. SIAM Journal on Applied Mathematics 48, 1073–1082, 1988.

    Google Scholar 

  3. Chan, S.C., A.K.C. Wong and D.K.Y. Chiu. A survey of multiple sequence comparison methods. To appear in the Bulletin of Mathematical Biology, 1992.

    Google Scholar 

  4. Feng, Da-Fei and Russell F. Doolittle. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25, 351–360, 1987.

    Google Scholar 

  5. Garey, Michael R. and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York, 1979.

    Google Scholar 

  6. Goldberg, Andrew V. and Robert E. Tarjan. A new approach to the maximum flow problem. Journal of the Association for Computing Machinery 35:4, 921–940, 1988.

    Google Scholar 

  7. Gotoh, Osamu. Consistency of optimal sequence alignments. Bulletin of Mathematical Biology 52:4, 509–525, 1990.

    Google Scholar 

  8. Gusfield, Dan. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bulletin of Mathematical Biology 55:1, 141–154, 1993.

    Google Scholar 

  9. Hsu, W.J. and M.W. Du. Computing a longest common subsequence for a set of strings. BIT 24, 45–59, 1984.

    Google Scholar 

  10. Irving, Robert W. and Campbell B. Fraser. Two algorithms for the longest common subsequence of three (or more) strings. In Proceedings of the 3rd Symposium on Combinatorial Pattern Matching, 211–226, 1992.

    Google Scholar 

  11. Kececioglu, John. Exact and Approximation Algorithms for DNA Sequence Reconstruction. PhD dissertation, Technical Report 91-26, Department of Computer Science, The University of Arizona, Tucson, Arizona 85721, 1991.

    Google Scholar 

  12. Maier, David. The complexity of some problems on subsequences and supersequences. Journal of the Association for Computing Machinery 25:2, 322–336, 1978.

    Google Scholar 

  13. Pevzner, Pavel. Multiple alignment, communication cost, and graph matching. To appear in SIAM Journal on Applied Mathematics.

    Google Scholar 

  14. Sankoff, David. Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics 28:1, 35–42, 1975.

    Google Scholar 

  15. Sankoff, David and Joseph B. Kruskal, editors. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, Massachusetts, 1983.

    Google Scholar 

  16. Sleator, Daniel D. and Robert E. Tarjan. Self-adjusting binary search trees. Journal of the Association for Computing Machinery 32:3, 652–686, 1985.

    Google Scholar 

  17. Smith, Temple F. and Michael S. Waterman. Identification of common molecular sequences. Journal of Molecular Biology 147, 195–197, 1981.

    Google Scholar 

  18. Vingron, Martin and Patrick Argos. A fast and sensitive multiple sequence alignment algorithm. Computer Applications in the Biosciences 5:2, 115–121, 1989.

    Google Scholar 

  19. Waterman, M.S. and R. Jones. Consensus methods for DNA and protein sequence alignment. Methods in Enzymology 188, 221–237, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Maxime Crochemore Zvi Galil Udi Manber

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kececioglu, J. (1993). The maximum weight trace problem in multiple sequence alignment. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1993. Lecture Notes in Computer Science, vol 684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029800

Download citation

  • DOI: https://doi.org/10.1007/BFb0029800

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56764-6

  • Online ISBN: 978-3-540-47732-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics