Skip to main content

Speeding Up the DIALIGN Multiple Alignment Program by Using the ‘Greedy Alignment of BIOlogical Sequences LIBrary’ (GABIOS-LIB)

  • Conference paper
  • First Online:
Computational Biology (JOBIM 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2066))

Included in the following conference series:

Abstract

A sensitive method for multiple sequence alignment should be able to align local motifs that are contained in some but not necessarily in all of the input sequences. In addition, it should be possible to integrate various of such partial local alignments into one single multiple output alignment. This leads to the question of consistency of partial alignments. Based on a new set-theoretical definition of sequence alignment, the consistency problem is discussed theoretically, and a recently developed library of C functions for consistency calculation (GABIOSLIB) is described. GABIOS-LIB has been integrated into the DIALIGN alignment program to carry out consistency tests during the multiple alignment procedure. While the resulting alignments are exactly the same as with the previous version of DIALIGN, the running time of the program has been crucially improved. For large data sets, the new version of DIALIGN is up to 120 times faster than the old version. Availability: http://bibiserv.TechFak.Uni-Bielefeld.DE/dialign/

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abdeddaïm. Fast and sound two-step algorithms for multiple alignment of nucleic sequences. In Proceedings of the IEEE International Joint Symposia on Intelligence and Systems, pages 4–11, 1996.

    Google Scholar 

  2. S. Abdeddaïm. Incremental computation of transitive closure and greedy alignment. In Proc. of 8-th Annual Symposium on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 167–179, 1997.

    Google Scholar 

  3. S. F. Altschul, W. Gish, W. Miller, E. M. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.

    Google Scholar 

  4. K.-M. Chao and W. Miller. Linear-space algorithms that build local alignments from fragments. Algorithmica, 13:106–134, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  5. E. Depiereux, G. Baudoux, P. Briffeuil, I. Reginster, X. D. Boll, C. Vinals, and E. Feytmans. Match-Box server: a multiple sequence alignment tool placing emphasis on reliability. CABIOS, 13:249–256, 1997.

    Google Scholar 

  6. E. Depiereux and E. Feytmans. Match-box: a fundamentally new algorithm for the simultaneous alignment of several protein sequences. CABIOS, 8:501–509, 1992.

    Google Scholar 

  7. D. Eppstein, Z. Galil, R. Giancarlo, and G. Italiano. Sparse dynamic programming I: Linear cost functions. J. Assoc. Comput. Mach., 39:519–545, 1992.

    MATH  MathSciNet  Google Scholar 

  8. O. Gotoh. An improved algorithm for matching biological sequences. J. Mol. Biol., 162:705–708, 1982.

    Article  Google Scholar 

  9. O. Gotoh. Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol., 264:823–838, 1996.

    Article  Google Scholar 

  10. B. Göttgens, L. Barton, J. Gilbert, A. Bench, M. Sanchez, S. Bahn, S. Mistry, D. Grafham, A. McMurray, M. Vaudin, E. Amaya, D. Bentley, and A. Green. Analysis of vertebrate scl loci identifies conserved enhancers. Nature Biotechnology, 18:181–186, 2000.

    Article  Google Scholar 

  11. D. Joseph, J. Meidanis, and P. Tiwari. Determining DNA sequence similarity using maximum independent set algorithms for interval graphs. Lecture Notes in Computer Science, 621:326–337, 1992.

    Google Scholar 

  12. A. Krause, P. Nicodème, E. Bornberg-Bauer, M. Rehmsmeier, and M. Vingron. Www access to the systers protein sequence cluster set. Bioinformatics, 15:262–263, 1999.

    Article  Google Scholar 

  13. C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science, 262(5131):208–4, 1993.

    Article  Google Scholar 

  14. W. Miller. So many genomes, so little time. Nature Biotechnology, 18:148–149, 2000.

    Article  Google Scholar 

  15. B. Morgenstern. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics, 15:211–218, 1999.

    Article  Google Scholar 

  16. B. Morgenstern. A space-efficient algorithm for aligning large genomic sequences. Bioinformatics, in press.

    Google Scholar 

  17. B. Morgenstern, A. W. M. Dress, and T. Werner. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. USA, 93:12098–12103, 1996.

    Article  MATH  Google Scholar 

  18. B. Morgenstern, K. Frech, A. W. M. Dress, and T. Werner. DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics, 14:290–294, 1998.

    Article  Google Scholar 

  19. B. Morgenstern, K. Hahn, W. R. Atchley, and A. W. M. Dress. Segment-based scores for pairwise and multiple sequence alignments. In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 115–121, Menlo Parc, CA, 1998. AAAI Press.

    Google Scholar 

  20. B. Morgenstern, J. Stoye, and A. W. M. Dress. Consistent equivalence relations: a set-theoretical framework for multiple sequence alignment. Materialien und Preprints 133, University of Bielefeld, 1999.

    Google Scholar 

  21. S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48:443–453, 1970.

    Article  Google Scholar 

  22. C. Notredame and D. Higgins. SAGA: sequence alignment by genetic algorithm. Nucleic Acids Research, 24:1515–1524, 1996.

    Article  Google Scholar 

  23. W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proc. Nat. Acad. Sci. USA, 85:2444–2448, 1988.

    Article  Google Scholar 

  24. T. F. Smith and M. S. Waterman. Comparison of biosequences. Advances in Applied Mathematics, 2:482–489, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  25. J. Stoye. Multiple sequence alignment with the divide-and-conquer method. Gene, 211:GC45–GC56, 1998.

    Article  Google Scholar 

  26. J. D. Thompson, D. G. Higgins, and T. J. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673–4680, 1994.

    Article  Google Scholar 

  27. J. D. Thompson, F. Plewniak, and O. Poch. BAliBASE: A benchmark alignment database for the evaluation of multiple sequence alignment programs. Bioinformatics, 15:87–88, 1999.

    Article  Google Scholar 

  28. J. D. Thompson, F. Plewniak, and O. Poch. A comprehensive comparison of protein sequence alignment programs. Nucleic Acids Research, 27:2682–2690, 1999.

    Google Scholar 

  29. J. D. Thompson, F. Plewniak, J.-C. Thierry, and O. Poch. DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Research, 28:2919–2926, 2000.

    Article  Google Scholar 

  30. M. Vingron and P. Argos. Motif recognition and alignment for many sequences by comparison of dot-matrices. J Mol Biol, 218(1):33–43, 1991.

    Article  Google Scholar 

  31. M. Vingron and P. Pevzner. Multiple sequence comparison and consistency on multipartite graphs. Advances in Applied Mathematics, 16:1–22, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  32. J. W. Wilbur and D. J. Lipman. The context dependent comparison of biological sequences. SIAM J. Appl. Math., 44:557–567, 1984.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abdeddaïm, S., Morgenstern, B. (2001). Speeding Up the DIALIGN Multiple Alignment Program by Using the ‘Greedy Alignment of BIOlogical Sequences LIBrary’ (GABIOS-LIB). In: Gascuel, O., Sagot, MF. (eds) Computational Biology. JOBIM 2000. Lecture Notes in Computer Science, vol 2066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45727-5_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-45727-5_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42242-6

  • Online ISBN: 978-3-540-45727-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics