Skip to main content

Aligning alignments

  • Session IV
  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1448))

Included in the following conference series:

Abstract

While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence), a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in which exact gap counts are computationally difficult to determine, we offer a framework in terms of optimistic and pessimistic gap counts. For optimistic and pessimistic gap counts we give efficient algorithms for the sequence vs. alignment, sequence vs. profile, alignment vs. alignment, and profile vs. profile variations, all of which run in essentially O(mn) time for two input alignments of lengths m and n. For exact gap counts, we give the first provably efficient algorithm for the sequence vs. alignment variation, which runs in essentially O(mn log n) time using the candidatelist technique developed for convex gap-costs, and we conjecture that the alignment vs. alignment variation is NP-complete.

Research supported in part by a National Science Foundation CAREER Award, Grant DBI-9722339.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F. “Gap costs for multiple sequence alignment.” Journal of Theoretical Biology 138, 297–309, 1989.

    Google Scholar 

  2. Anson, E.L. and E.W. Myers. “ReAligner: a program for refining DNA sequence multi-alignments.” Proceedings of the 1st ACM Conference on Computational Molecular Biology, 9–13, 1997.

    Google Scholar 

  3. Carrillo, H. and D. Lipman. “The multiple sequence alignment problem in biology.” SIAM Journal on Applied Mathematics 48, 1073–1082, 1988.

    Google Scholar 

  4. Dayhoff, M.O., R.M. Schwartz and B.C. Orcutt. “A model of evolutionary change in proteins.” In Atlas of Protein Sequence and Structure 5:3, M.O. Dayhoff editor, 345–352, 1978.

    Google Scholar 

  5. Fredman, M.L. “Algorithms for computing evolutionary similarity measures with length independent gap penalties.” Bulletin of Mathematical Biology 46:4, 553–566, 1984.

    Google Scholar 

  6. Galil, Z. and R. Giancarlo. “Speeding up dynamic programming with applications to molecular biology.” Theoretical Computer Science 64, 107–118, 1989.

    Google Scholar 

  7. Gotoh, O. “An improved algorithm for matching biological sequences.” Journal of Molecular Biology 162, 705–708, 1982.

    Google Scholar 

  8. Gotoh, O. “Optimal alignment between groups of sequences and its application to multiple sequence alignment.” Computer Applications in the Biosciences 9:3, 361–370, 1993.

    Google Scholar 

  9. Gotoh, O. “Further improvement in methods of group-to-group sequence alignment with generalized profile operations.” Computer Applications in the Biosciences 10:4, 379–387, 1994.

    Google Scholar 

  10. Gribskov, M., A.D. McLachlan, and D. Eisenberg. “Profile analysis: detection of distantly related proteins.” Proceedings of the National Academy of Sciences USA 84, 4355–4358, 1987.

    Google Scholar 

  11. Gupta, S., J. Kececioglu and A. Schäffer. “Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment.” Journal of Computational Biology 2:3, 459–472, 1995.

    Google Scholar 

  12. Gusfield, D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, 1997.

    Google Scholar 

  13. Henikoff, S. and J.G. Henikoff. “Amino acid substitution matrices from protein blocks.” Proceedings of the National Academy of Sciences USA 89, 10915–10919, 1992.

    Google Scholar 

  14. Hirschberg, D.S. “A linear space algorithm for computing longest common subsequences.” Communications of the ACM 18, 341–343, 1975.

    Google Scholar 

  15. Lipman, D.G., S.F. Altschul and J.D. Kececioglu. “A tool for multiple sequence alignment.” Proceedings of the National Academy of Sciences USA 86, 4412–4415, 1989.

    Google Scholar 

  16. Miller, W. and E.W. Myers. “Sequence comparison with concave weighting functions.” Bulletin of Mathematical Biology 50, 97–120, 1988.

    Google Scholar 

  17. Myers, E.W. and W. Miller. “Optimal alignments in linear space.” Computer Applications in the Biosciences 4:1, 11–17, 1988.

    Google Scholar 

  18. Myers, G., S. Selznick, Z. Zhang and W. Miller. “Progressive multiple alignment with constraints.” Proceedings of the 1st ACM Conference on Computational Molecular Biology, 220–225, 1997.

    Google Scholar 

  19. Sankoff, D. and J.B. Kruskal, editors. Time Warps, String Edits, and Macro molecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA, 1983.

    Google Scholar 

  20. Setubal, J. and J. Meidanis. Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, 1997.

    Google Scholar 

  21. Taylor, E.W., A. Bhat, R. Nadimpalli, W. Zhang and J.D. Kececioglu. “HIV-1 encodes a sequence overlapping env gp41 with highly significant similarity to selenium dependent glutathione peroxidases.” Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology 15:5, 393–394, 1997.

    Google Scholar 

  22. Wang, L. and T. Jiang. “On the complexity of multiple sequence alignment.” Journal of Computational Biology 1:4, 337–348, 1994.

    Google Scholar 

  23. Waterman, M.S. “Efficient sequence alignment algorithms.” Journal of Theoretical Biology 108, 333–337, 1984.

    Google Scholar 

  24. Waterman, M.S. Introduction to Computational Biology: Maps, Sequences, and Genomes. Chapman and Hall, London, 1995.

    Google Scholar 

  25. Zhang, W., J.D. Kececioglu and E.W. Taylor. “Assessing distant homology between an aligned protein family and a proposed member through accurate sequence alignment.” Technical Report 97-3, Department of Computer Science, The University of Georgia, October 1997. Submitted to Journal of Molecular Biology.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Martin Farach-Colton

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kececioglu, J.D., Zhang, W. (1998). Aligning alignments. In: Farach-Colton, M. (eds) Combinatorial Pattern Matching. CPM 1998. Lecture Notes in Computer Science, vol 1448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030790

Download citation

  • DOI: https://doi.org/10.1007/BFb0030790

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64739-3

  • Online ISBN: 978-3-540-69054-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics