Abstract
We consider a branch-and-cut approach for solving the multiple sequence alignment problem, which is a central problem in computational biology. We propose a general model for this problem in which arbitrary gap costs are allowed. An interesting aspect of our approach is that the three (exponentially large) classes of natural valid inequalities that we consider turn out to be both facet-defining for the convex hull of integer solutions and separable in polynomial time. Both the proofs that these classes of valid inequalities are facet-defining and the description of the separation algorithms are far from trivial. Experimental results on several benchmark instances show that our method outperforms the best tools developed so far, in that it produces alignments that are better from a biological point of view. A noteworthy outcome of the results is the effectiveness of using branch-and-cut with only a carefully-selected subset of the variables as a heuristic.
Similar content being viewed by others
References
Achterberg, T.: SCIP - a framework to integrate constraint and mixed integer programming. Technical Report 04-19, Zuse Institute Berlin, 2004. http://www.zib.de/bib/pub/pw
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Bienstock, D.: Potential function methods for approximately solving linear programming problems, Theory and Practice. Kluwer Academic Publishers, Boston, 2002
Carr, R.D., Lancia, G.: Compact vs exponential-size lp relaxations. Operations Research Letters 30, 57–65 (2002)
Carr, R.D., Lancia, G.: Compact optimization can outperform separation: A case study in structural proteomics. 4OR 2, 221–233 (2004)
Carrillo, H., Lipman, D.J.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48 (5), 1073–1082 (1988)
Dayhoff, M., Schwartz, R., Orcut, B.: A model of evolutionary change in proteins. In: M. Dayhoff (ed.) Atlas of Protein Sequence and Structure, vol 5, National Biomedical Research Foundation, Washington, D.C., 1979, pp 345–352
Delcher, A., Kasif, S., Fleischmann, R., J. Peterson, W. O., Salzberg, S.: Alignment of whole genomes. Nucleic Acids Res 27, 2369–2376 (1999)
Eppstein, D.: Sequence comparison with mixed convex and concave costs. J Algorithms (11), 85–101 (1990)
Fischetti, M., Toth, P.: A polyhedral approach to the asymmetric traveling salesman problem. Management Sci 43 (11), 1520–1536 (1997)
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979
Golumbic, M.C.: Algorithmic graph theory and perfect graphs. Academic Press, New York, 1980
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)
Gupta, S., Kececioglu, J., Schaeffer, A.: Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 2, 459–472 (1995)
Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge, 1997
Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Science 89, 10915–10919 (1992)
Kipp Martin, R.: Using separation algorithms to generate mixed integer model reformulations. Oper. Res. Lett. 10, 119–128 (1991)
Larmore, L., Schieber, B.: Online dynamic programming with applications to the prediction of rna secondary structure. In: Proceedings of the First Symposium on Discrete Algorithms 1990, pp 503–512
LEDA (Library of Efficient Data Types and Algorithms), 2004. http://www.algorithmic-solutions.com
Lenhof, H.-P., Morgenstern, B., Reinert, K.: An exact solution for the segment-to-segment multiple sequence alignment problem. Bioinformatics 15 (3), 203–210 (1999)
Lermen, M., Reinert, K.: The practical use of the
algorithm for exact multiple sequence alignment. J. Comput. Biol. 7(5), 655–673 (2000)
Notredame, C., Higgins, D.G., Heringa, J.: T-coffee : A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)
Reinert, K.: A Polyhedral Approach to Sequence Alignment Problems. PhD thesis, Universität des Saarlandes, 1999
Reinert, K., Lenhof, H.-P., Mutzel, P., Mehlhorn, K., Kececioglu, J.: A branch-and-cut algorithm for multiple sequence alignment. In: Proceedings of the First Annual International Conference on Computational Molecular Biology (RECOMB-97), 1997, pp 241–249
K. Reinert, J. Stoye, and T. Will. An iterative methods for faster sum-of-pairs multiple sequence alignment. BIOINFORMATICS 16(9):808–814, 2000.
SCIL–Symbolic Constraints for Integer Linear programming, 2002. http://www.mpi-sb.mpg.de/SCIL
Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: A benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15 (1), 87–88 (1999) http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE/prog_scores.html
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348 (1994)
Author information
Authors and Affiliations
Additional information
Received: April, 2004
Rights and permissions
About this article
Cite this article
Althaus, E., Caprara, A., Lenhof, HP. et al. A branch-and-cut algorithm for multiple sequence alignment. Math. Program. 105, 387–425 (2006). https://doi.org/10.1007/s10107-005-0659-3
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-005-0659-3