Abstract
While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence), a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in which exact gap counts are computationally difficult to determine, we offer a framework in terms of optimistic and pessimistic gap counts. For optimistic and pessimistic gap counts we give efficient algorithms for the sequence vs. alignment, sequence vs. profile, alignment vs. alignment, and profile vs. profile variations, all of which run in essentially O(mn) time for two input alignments of lengths m and n. For exact gap counts, we give the first provably efficient algorithm for the sequence vs. alignment variation, which runs in essentially O(mn log n) time using the candidatelist technique developed for convex gap-costs, and we conjecture that the alignment vs. alignment variation is NP-complete.
Research supported in part by a National Science Foundation CAREER Award, Grant DBI-9722339.
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S.F. “Gap costs for multiple sequence alignment.” Journal of Theoretical Biology 138, 297–309, 1989.
Anson, E.L. and E.W. Myers. “ReAligner: a program for refining DNA sequence multi-alignments.” Proceedings of the 1st ACM Conference on Computational Molecular Biology, 9–13, 1997.
Carrillo, H. and D. Lipman. “The multiple sequence alignment problem in biology.” SIAM Journal on Applied Mathematics 48, 1073–1082, 1988.
Dayhoff, M.O., R.M. Schwartz and B.C. Orcutt. “A model of evolutionary change in proteins.” In Atlas of Protein Sequence and Structure 5:3, M.O. Dayhoff editor, 345–352, 1978.
Fredman, M.L. “Algorithms for computing evolutionary similarity measures with length independent gap penalties.” Bulletin of Mathematical Biology 46:4, 553–566, 1984.
Galil, Z. and R. Giancarlo. “Speeding up dynamic programming with applications to molecular biology.” Theoretical Computer Science 64, 107–118, 1989.
Gotoh, O. “An improved algorithm for matching biological sequences.” Journal of Molecular Biology 162, 705–708, 1982.
Gotoh, O. “Optimal alignment between groups of sequences and its application to multiple sequence alignment.” Computer Applications in the Biosciences 9:3, 361–370, 1993.
Gotoh, O. “Further improvement in methods of group-to-group sequence alignment with generalized profile operations.” Computer Applications in the Biosciences 10:4, 379–387, 1994.
Gribskov, M., A.D. McLachlan, and D. Eisenberg. “Profile analysis: detection of distantly related proteins.” Proceedings of the National Academy of Sciences USA 84, 4355–4358, 1987.
Gupta, S., J. Kececioglu and A. Schäffer. “Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment.” Journal of Computational Biology 2:3, 459–472, 1995.
Gusfield, D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, 1997.
Henikoff, S. and J.G. Henikoff. “Amino acid substitution matrices from protein blocks.” Proceedings of the National Academy of Sciences USA 89, 10915–10919, 1992.
Hirschberg, D.S. “A linear space algorithm for computing longest common subsequences.” Communications of the ACM 18, 341–343, 1975.
Lipman, D.G., S.F. Altschul and J.D. Kececioglu. “A tool for multiple sequence alignment.” Proceedings of the National Academy of Sciences USA 86, 4412–4415, 1989.
Miller, W. and E.W. Myers. “Sequence comparison with concave weighting functions.” Bulletin of Mathematical Biology 50, 97–120, 1988.
Myers, E.W. and W. Miller. “Optimal alignments in linear space.” Computer Applications in the Biosciences 4:1, 11–17, 1988.
Myers, G., S. Selznick, Z. Zhang and W. Miller. “Progressive multiple alignment with constraints.” Proceedings of the 1st ACM Conference on Computational Molecular Biology, 220–225, 1997.
Sankoff, D. and J.B. Kruskal, editors. Time Warps, String Edits, and Macro molecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA, 1983.
Setubal, J. and J. Meidanis. Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, 1997.
Taylor, E.W., A. Bhat, R. Nadimpalli, W. Zhang and J.D. Kececioglu. “HIV-1 encodes a sequence overlapping env gp41 with highly significant similarity to selenium dependent glutathione peroxidases.” Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology 15:5, 393–394, 1997.
Wang, L. and T. Jiang. “On the complexity of multiple sequence alignment.” Journal of Computational Biology 1:4, 337–348, 1994.
Waterman, M.S. “Efficient sequence alignment algorithms.” Journal of Theoretical Biology 108, 333–337, 1984.
Waterman, M.S. Introduction to Computational Biology: Maps, Sequences, and Genomes. Chapman and Hall, London, 1995.
Zhang, W., J.D. Kececioglu and E.W. Taylor. “Assessing distant homology between an aligned protein family and a proposed member through accurate sequence alignment.” Technical Report 97-3, Department of Computer Science, The University of Georgia, October 1997. Submitted to Journal of Molecular Biology.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kececioglu, J.D., Zhang, W. (1998). Aligning alignments. In: Farach-Colton, M. (eds) Combinatorial Pattern Matching. CPM 1998. Lecture Notes in Computer Science, vol 1448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030790
Download citation
DOI: https://doi.org/10.1007/BFb0030790
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64739-3
Online ISBN: 978-3-540-69054-2
eBook Packages: Springer Book Archive