Abstract
Several algorithms based on heuristics have been proposed for the multiple alignment of sequences. The most efficient in time computation are often greedy algorithms. At each step a greedy alignment algorithm must know if two characters are alignable or not, regarding to the characters definitely aligned before. We show that this problem is reducible to find paths in a directed graph. We give an incremental algorithm that maintains the transitive closure of a graph for which we know a spanning set of k disjoined paths. Our algorithm maintains the transitive closure of a graph of n vertices and m edges (in the final state) in O(k 2 m+n minm, n) time and O(kn) space. We show that this algorithm can be used by any greedy alignment algorithm to know in constant time if two characters are alignable or not, by maintaining the transitive closure of an alignment graph in O(k 2 n+n 2) time and O(kn) space, for k sequences whose total length is n. As an example of application we have implemented TwoAlign a efficient multiple alignment program based on greedy computation of pairwise local alignments.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. R. Garey and D. S. Johnson. Computers and intractability; a guide to the theory of NP-completeness. Freeman, 1979.
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. J. Comput. Biol., 1:337–348, 1994.
T. Jiang, E. L. Lawler, and L. Wang. Aligning sequences via an evolutionary tree: complexity and approximation. In Proc. 26-th Annual ACM Symp. Theory of Comput., pages 760–769, 1994.
P. Hogeweg and B. Hesper. The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol., 20:175–186, 1984.
D-F. Feng and R. F. Doolittle. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol., 25:351–360, 1987.
W. R. Taylor. Protein structure prediction. In M. J. Bishop and C. J. Rawlings, editors, Nucleic Acid and Protein Sequence Analysis, a Practical Approach., pages 285–323. IRL Press, 1987.
F. Corpet. Multiple sequence alignment with hierarchial clustering. Nucleic Acids Research, 16(22): 10881–10890, 1988.
D.G. Higgins and P.M. Sharp. Fast and sensitive multiple sequence alignments on a microcomputer. CABIOS, 5:151–153, 1989.
O. Gotoh. Further improvement in methods of group-to-group sequence alignment with generalized profile operations. CABIOS, 10(4):379–387, 1994.
A. M. Landraud, J. F. Avril, and P. Chrétienne. An algorithm for finding a common structure shared by a family of strings. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11:890–895, 1989.
Said Abdeddaim. Fast and sound two-step algorithms for multiple alignment of nucleic sequences. In Proceedings of the IEEE International Joint Symposia on Intelligence and Systems, pages 4–11, 1996.
T. Ibaraki and N. Katoh. On-line computation of transitive closure for graphs. Inform. Proc. Lett., 16:95–97, 1983.
G. F. Italiano. Amortized efficiency of a path retrieval data structure. Theor. Comput. Sci., 48:273–281, 1986.
J. A. La Poutré and J. van Leeuwen. Maintenance of transitive closure and transitive reduction of graphs. In Proc. Workshop on Graph-Theoretic Concepts in Computer Science, pages 106–120. Lecture Notes in Computer Science 314, Springer-Verlag, 1988.
J. D. Thompson, D. G. Higgins, and T. J. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673–4680, 1994.
F. Mattern. Virtual time and global states of distributed systems. In Proc. Workshop on Parallel and Distributed Algorithms, pages 215–226, 1989.
C. J. Fidge. Timestamps in message-passing systems that preserve the partial ordering. In 11-th Australian Computer Science Conference, pages 55–66, 1988.
J. Kececioglu. The maximum weight trace problem in multiple sequence alignment. In 4-th Annual Symp. Combinatorial Pattern Matching, volume 684 of LNCS, pages 106–119. 1993.
S. F. Altschul. Gap costs for multiple sequence alignment. J. Theor. Biol., 138:297–309, 1989.
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195–197, 1981.
T. K. Attwood, M. E. Beck, A. J. Bleasby, and D. J. Parry-Smith. PRINTS — a database of protein motif fingeprints. Nucleic Acids Research, 22:3590–3596, 1994.
M. S. Waterman. Mathematical Methods for DNA Sequences. C.R.C. Press, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abdeddaïm, S. (1997). On incremental computation of transitive closure and greedy alignment. In: Apostolico, A., Hein, J. (eds) Combinatorial Pattern Matching. CPM 1997. Lecture Notes in Computer Science, vol 1264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63220-4_58
Download citation
DOI: https://doi.org/10.1007/3-540-63220-4_58
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63220-7
Online ISBN: 978-3-540-69214-0
eBook Packages: Springer Book Archive