Skip to main content

Advertisement

Log in

A branch-and-cut algorithm for multiple sequence alignment

  • Published:
Mathematical Programming Submit manuscript

Abstract

We consider a branch-and-cut approach for solving the multiple sequence alignment problem, which is a central problem in computational biology. We propose a general model for this problem in which arbitrary gap costs are allowed. An interesting aspect of our approach is that the three (exponentially large) classes of natural valid inequalities that we consider turn out to be both facet-defining for the convex hull of integer solutions and separable in polynomial time. Both the proofs that these classes of valid inequalities are facet-defining and the description of the separation algorithms are far from trivial. Experimental results on several benchmark instances show that our method outperforms the best tools developed so far, in that it produces alignments that are better from a biological point of view. A noteworthy outcome of the results is the effectiveness of using branch-and-cut with only a carefully-selected subset of the variables as a heuristic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Achterberg, T.: SCIP - a framework to integrate constraint and mixed integer programming. Technical Report 04-19, Zuse Institute Berlin, 2004. http://www.zib.de/bib/pub/pw

  2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Article  Google Scholar 

  3. Bienstock, D.: Potential function methods for approximately solving linear programming problems, Theory and Practice. Kluwer Academic Publishers, Boston, 2002

  4. Carr, R.D., Lancia, G.: Compact vs exponential-size lp relaxations. Operations Research Letters 30, 57–65 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  5. Carr, R.D., Lancia, G.: Compact optimization can outperform separation: A case study in structural proteomics. 4OR 2, 221–233 (2004)

    Google Scholar 

  6. Carrillo, H., Lipman, D.J.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48 (5), 1073–1082 (1988)

    MathSciNet  Google Scholar 

  7. Dayhoff, M., Schwartz, R., Orcut, B.: A model of evolutionary change in proteins. In: M. Dayhoff (ed.) Atlas of Protein Sequence and Structure, vol 5, National Biomedical Research Foundation, Washington, D.C., 1979, pp 345–352

  8. Delcher, A., Kasif, S., Fleischmann, R., J. Peterson, W. O., Salzberg, S.: Alignment of whole genomes. Nucleic Acids Res 27, 2369–2376 (1999)

    Google Scholar 

  9. Eppstein, D.: Sequence comparison with mixed convex and concave costs. J Algorithms (11), 85–101 (1990)

    Google Scholar 

  10. Fischetti, M., Toth, P.: A polyhedral approach to the asymmetric traveling salesman problem. Management Sci 43 (11), 1520–1536 (1997)

    Google Scholar 

  11. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979

  12. Golumbic, M.C.: Algorithmic graph theory and perfect graphs. Academic Press, New York, 1980

  13. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)

    Article  Google Scholar 

  14. Gupta, S., Kececioglu, J., Schaeffer, A.: Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 2, 459–472 (1995)

    Article  Google Scholar 

  15. Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge, 1997

  16. Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Science 89, 10915–10919 (1992)

    Google Scholar 

  17. Kipp Martin, R.: Using separation algorithms to generate mixed integer model reformulations. Oper. Res. Lett. 10, 119–128 (1991)

    MATH  MathSciNet  Google Scholar 

  18. Larmore, L., Schieber, B.: Online dynamic programming with applications to the prediction of rna secondary structure. In: Proceedings of the First Symposium on Discrete Algorithms 1990, pp 503–512

  19. LEDA (Library of Efficient Data Types and Algorithms), 2004. http://www.algorithmic-solutions.com

  20. Lenhof, H.-P., Morgenstern, B., Reinert, K.: An exact solution for the segment-to-segment multiple sequence alignment problem. Bioinformatics 15 (3), 203–210 (1999)

    Google Scholar 

  21. Lermen, M., Reinert, K.: The practical use of the algorithm for exact multiple sequence alignment. J. Comput. Biol. 7(5), 655–673 (2000)

    Google Scholar 

  22. Notredame, C., Higgins, D.G., Heringa, J.: T-coffee : A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)

    Article  Google Scholar 

  23. Reinert, K.: A Polyhedral Approach to Sequence Alignment Problems. PhD thesis, Universität des Saarlandes, 1999

  24. Reinert, K., Lenhof, H.-P., Mutzel, P., Mehlhorn, K., Kececioglu, J.: A branch-and-cut algorithm for multiple sequence alignment. In: Proceedings of the First Annual International Conference on Computational Molecular Biology (RECOMB-97), 1997, pp 241–249

  25. K. Reinert, J. Stoye, and T. Will. An iterative methods for faster sum-of-pairs multiple sequence alignment. BIOINFORMATICS 16(9):808–814, 2000.

    Google Scholar 

  26. SCIL–Symbolic Constraints for Integer Linear programming, 2002. http://www.mpi-sb.mpg.de/SCIL

  27. Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: A benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15 (1), 87–88 (1999) http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE/prog_scores.html

    Google Scholar 

  28. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Received: April, 2004

Rights and permissions

Reprints and permissions

About this article

Cite this article

Althaus, E., Caprara, A., Lenhof, HP. et al. A branch-and-cut algorithm for multiple sequence alignment. Math. Program. 105, 387–425 (2006). https://doi.org/10.1007/s10107-005-0659-3

Download citation

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-005-0659-3

Keywords

Navigation