Skip to main content

Tsukuba BB: A Branch and Bound Algorithm for Local Multiple Sequence Alignment

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1848))

Abstract

In this paper we present a branch and bound algorithm for local gapless multiple sequence alignment (motif alignment) and its implementation. This is the first program to exploit the fact that the motif alignment problem is easier for short motifs. Indeed for a fixed motif width the running time of the algorithm is asymptotically linear in the size of the input. We tested the performance of the program on a dataset of 300 E. coli promoter sequences. For a motif width of 4 the optimal alignment of the entire set of sequences can be found. For the more natural motif width of 6 the program can align 19 sequences of length 100; more than twice the number of sequences which can be aligned by the best previous exact algorithm. The algorithm can relax the constraint of requiring each sequence to be aligned, and align 100 of the 300 promoter sequences with a motif width of 6. We also compare the effectiveness of the Gibbs sampling and beam search heuristics on this problem and show that in some cases our branch and bound algorithm can find the optimal solution, with proof of optimality, when those heuristics fail to find the optimal solution.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tatsuya Akutsu. Hardness results on gapless local multiple sequence alignment. Technical Report 98-MPS-24-2, Information Processing Society of Japan, 1998.

    Google Scholar 

  2. Tatsuya Akutsu, Hiroki Arimura, and Shinichi Shimozono. On approximation algorithms for local multiple alignment. RECOMB2000, 2000. in press.

    Google Scholar 

  3. Timothy L. Bailey. Likelihood vs. information in aligning biopolymer sequences. Technical Report CS93-318, UCSD, February 1993.

    Google Scholar 

  4. Timothy L. Bailey and Charles Elkan. The value of prior knowledge in discovering motifs with meme. In Proceeding of the Third International Conference on Intelligent Systems for Molecular Biology, pages 21–38. AAAI Press, 1995.

    Google Scholar 

  5. G. Z. Hertz and G. D. Stormo. Identifying dna and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 1999. in press.

    Google Scholar 

  6. Paul Horton. A branch and bound algorithm for local multiple alignment. In Pacific Symposium on Biocomputing’ 96, pages 368–383, 1996.

    Google Scholar 

  7. Paul Horton. On the complexity of some local multiple sequence alignment problems. Technical Report TR-990001, Real World Computing Partnership, 1999.

    Google Scholar 

  8. E. Lawrence, S. F. Altschul, M.B. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science, 262:208–214, 1993.

    Article  Google Scholar 

  9. Charles E. Lawrence and Andrew A. Reilly. An expectation maximization (em) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. PROTEINS, 7:41–51, 1990.

    Article  Google Scholar 

  10. Ming Li, Bin Ma, and Lusheng Wang. Finding similar regions in many strings. In STOC, 1999.

    Google Scholar 

  11. Shlomit Lisser and Hanah Margalit. Compilation of e.coli mrna promoter sequences. Nucleic Acids Research, 21(7):1507–1516, 1993.

    Article  Google Scholar 

  12. Gary Stormo and George W. Hartzell. Identifying protein-binding sites from unaligned dna fragments. Proc. Natl. Acad. Sci., USA, 86:1183–1187, 1989.

    Article  Google Scholar 

  13. Gary D. Stormo. Consensus patterns in dna. Methods in Enzymology, 183:211–221, 1990.

    Article  Google Scholar 

  14. Martin Tompa. An exact method for finding short motifs in sequences with application to the ribosome binding site problem. In Proceeding of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 262–271, Menlo Park, 1999. AAAI Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Horton, P. (2000). Tsukuba BB: A Branch and Bound Algorithm for Local Multiple Sequence Alignment. In: Giancarlo, R., Sankoff, D. (eds) Combinatorial Pattern Matching. CPM 2000. Lecture Notes in Computer Science, vol 1848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45123-4_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-45123-4_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67633-1

  • Online ISBN: 978-3-540-45123-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics