Abstract
A sensitive method for multiple sequence alignment should be able to align local motifs that are contained in some but not necessarily in all of the input sequences. In addition, it should be possible to integrate various of such partial local alignments into one single multiple output alignment. This leads to the question of consistency of partial alignments. Based on a new set-theoretical definition of sequence alignment, the consistency problem is discussed theoretically, and a recently developed library of C functions for consistency calculation (GABIOSLIB) is described. GABIOS-LIB has been integrated into the DIALIGN alignment program to carry out consistency tests during the multiple alignment procedure. While the resulting alignments are exactly the same as with the previous version of DIALIGN, the running time of the program has been crucially improved. For large data sets, the new version of DIALIGN is up to 120 times faster than the old version. Availability: http://bibiserv.TechFak.Uni-Bielefeld.DE/dialign/
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Abdeddaïm. Fast and sound two-step algorithms for multiple alignment of nucleic sequences. In Proceedings of the IEEE International Joint Symposia on Intelligence and Systems, pages 4–11, 1996.
S. Abdeddaïm. Incremental computation of transitive closure and greedy alignment. In Proc. of 8-th Annual Symposium on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 167–179, 1997.
S. F. Altschul, W. Gish, W. Miller, E. M. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.
K.-M. Chao and W. Miller. Linear-space algorithms that build local alignments from fragments. Algorithmica, 13:106–134, 1995.
E. Depiereux, G. Baudoux, P. Briffeuil, I. Reginster, X. D. Boll, C. Vinals, and E. Feytmans. Match-Box server: a multiple sequence alignment tool placing emphasis on reliability. CABIOS, 13:249–256, 1997.
E. Depiereux and E. Feytmans. Match-box: a fundamentally new algorithm for the simultaneous alignment of several protein sequences. CABIOS, 8:501–509, 1992.
D. Eppstein, Z. Galil, R. Giancarlo, and G. Italiano. Sparse dynamic programming I: Linear cost functions. J. Assoc. Comput. Mach., 39:519–545, 1992.
O. Gotoh. An improved algorithm for matching biological sequences. J. Mol. Biol., 162:705–708, 1982.
O. Gotoh. Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol., 264:823–838, 1996.
B. Göttgens, L. Barton, J. Gilbert, A. Bench, M. Sanchez, S. Bahn, S. Mistry, D. Grafham, A. McMurray, M. Vaudin, E. Amaya, D. Bentley, and A. Green. Analysis of vertebrate scl loci identifies conserved enhancers. Nature Biotechnology, 18:181–186, 2000.
D. Joseph, J. Meidanis, and P. Tiwari. Determining DNA sequence similarity using maximum independent set algorithms for interval graphs. Lecture Notes in Computer Science, 621:326–337, 1992.
A. Krause, P. Nicodème, E. Bornberg-Bauer, M. Rehmsmeier, and M. Vingron. Www access to the systers protein sequence cluster set. Bioinformatics, 15:262–263, 1999.
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science, 262(5131):208–4, 1993.
W. Miller. So many genomes, so little time. Nature Biotechnology, 18:148–149, 2000.
B. Morgenstern. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics, 15:211–218, 1999.
B. Morgenstern. A space-efficient algorithm for aligning large genomic sequences. Bioinformatics, in press.
B. Morgenstern, A. W. M. Dress, and T. Werner. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. USA, 93:12098–12103, 1996.
B. Morgenstern, K. Frech, A. W. M. Dress, and T. Werner. DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics, 14:290–294, 1998.
B. Morgenstern, K. Hahn, W. R. Atchley, and A. W. M. Dress. Segment-based scores for pairwise and multiple sequence alignments. In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 115–121, Menlo Parc, CA, 1998. AAAI Press.
B. Morgenstern, J. Stoye, and A. W. M. Dress. Consistent equivalence relations: a set-theoretical framework for multiple sequence alignment. Materialien und Preprints 133, University of Bielefeld, 1999.
S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48:443–453, 1970.
C. Notredame and D. Higgins. SAGA: sequence alignment by genetic algorithm. Nucleic Acids Research, 24:1515–1524, 1996.
W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proc. Nat. Acad. Sci. USA, 85:2444–2448, 1988.
T. F. Smith and M. S. Waterman. Comparison of biosequences. Advances in Applied Mathematics, 2:482–489, 1981.
J. Stoye. Multiple sequence alignment with the divide-and-conquer method. Gene, 211:GC45–GC56, 1998.
J. D. Thompson, D. G. Higgins, and T. J. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673–4680, 1994.
J. D. Thompson, F. Plewniak, and O. Poch. BAliBASE: A benchmark alignment database for the evaluation of multiple sequence alignment programs. Bioinformatics, 15:87–88, 1999.
J. D. Thompson, F. Plewniak, and O. Poch. A comprehensive comparison of protein sequence alignment programs. Nucleic Acids Research, 27:2682–2690, 1999.
J. D. Thompson, F. Plewniak, J.-C. Thierry, and O. Poch. DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Research, 28:2919–2926, 2000.
M. Vingron and P. Argos. Motif recognition and alignment for many sequences by comparison of dot-matrices. J Mol Biol, 218(1):33–43, 1991.
M. Vingron and P. Pevzner. Multiple sequence comparison and consistency on multipartite graphs. Advances in Applied Mathematics, 16:1–22, 1995.
J. W. Wilbur and D. J. Lipman. The context dependent comparison of biological sequences. SIAM J. Appl. Math., 44:557–567, 1984.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abdeddaïm, S., Morgenstern, B. (2001). Speeding Up the DIALIGN Multiple Alignment Program by Using the ‘Greedy Alignment of BIOlogical Sequences LIBrary’ (GABIOS-LIB). In: Gascuel, O., Sagot, MF. (eds) Computational Biology. JOBIM 2000. Lecture Notes in Computer Science, vol 2066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45727-5_1
Download citation
DOI: https://doi.org/10.1007/3-540-45727-5_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42242-6
Online ISBN: 978-3-540-45727-5
eBook Packages: Springer Book Archive