Skip to main content
Log in

Novel evolutionary models and applications to sequence alignment problems

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

In this paper, we present a novel graph-theoretical approach for representing a wide variety of sequence analysis problems within a single model. The model allows incorporation of the operations “insertion”, “deletion”, and “substitution”, and various parameters such as relative distances and weights. Conceptually, we refer the problem as the minimum weight common mutated sequence (MWCMS) problem. The MWCMS model has many applications including multiple sequence alignment problem, the phylogenetic analysis, the DNA sequencing problem, and sequence comparison problem, which encompass a core set of very difficult problems in computational biology. Thus the model presented in this paper lays out a mathematical modeling framework that allows one to investigate theoretical and computational issues, and to forge new advances for these distinct, but related problems.

Through the introduction of supernodes, and the multi-layer supergraph, we proved that MWCMS is \({NP}\)-complete. Furthermore, it was shown that a conflict graph derived from the multi-layer supergraph has the property that a solution to the associated node-packing problem of the conflict graph corresponds to a solution of the MWCMS problem. In this case, we proved that when the number of input sequences is a constant, MWCMS is polynomial-time solvable. We also demonstrated that some well-known combinatorial problems can be viewed as special cases of the MWCMS problem. In particular, we presented theoretical results implied by the MWCMS theory for the minimum weight supersequence problem, the minimum weight superstring problem, and the longest common subsequence problem.

Two integer programming formulations were presented and a simple yet elegant decomposition heuristic was introduced. The integer programming instances have proven to be computationally intensive. Consequently, research involving simultaneous column and row generation and parallel computing will be explored. The heuristic algorithm, introduced herein for multiple sequence alignment, overcomes the order-dependent drawbacks of many of the existing algorithms, and is capable of returning good sequence alignments within reasonable computational time. It is able to return the optimal alignment for multiple sequences of length less than 1500 base pairs within 30 minutes. Its algorithmic decomposition nature lends itself naturally for parallel distributed computing, and we continue to explore its flexibility and scalability in a massive parallel environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Babel, L. (1991). “Finding Maximum Cliques in Arbitrary and in Special Graphs.” Computing, 46(4), 321–341.

    Article  Google Scholar 

  • Baeza-Yates, R.A. and C.H. Perleberg. (1992). “Fast and Practical Approximate String Matching.” In Proceeings of the 3rd Annual Symposium on Combinatorial Pattern Matching.

  • Bains, W. and G.C. Smith. (1988). “A Novel Nethod for DNA Sequence Determination.” Journal of Theoretical Biology, 135, 303–307.

    Article  Google Scholar 

  • Bellare, M. and M. Sudan. (1994). “Improved Non-Approximability Results.” In Proc. 26th ACM Symp. on Theory of Computing, pp. 184–193.

  • Berge, C. (1961). “Färbung Von Graphen Deren Sämtliche bzw, Ungerade Kreise Starr Sind.” Wiss. Z. Matin-Luther-Univ. Halle-Wittenberg, 114.

  • Boppana, R. and M.M. Haldorsson. (1992). “Approximating Maximum Independent Set by Excluding Subgraphs.” BIT, 32, 130–196.

    Article  Google Scholar 

  • Chenna, R., H. Sugawara, T. Koike, T.J. Gibson, D.G. Higgins, and J.D. Thompson. (2003). “Multiple Sequence Alignment with the Clustal Series of Programs.” Nucleic Acids Research, 31(13), 3497–3500.

    Article  Google Scholar 

  • Chvátal, V. (1985). “Star-Cutsets and Perfect Graphs.” Journal of Combinatorial Theory Series B, 39, 189–199.

    Article  Google Scholar 

  • Chvátal, V. and D. Sankoff. (1975). “Longest Common Subsequences of two Random Sequences.” Journal of Applied Probability, 12, 306–315.

    Article  Google Scholar 

  • Duchet, P. (1984). “Classical Perfect Graphs, An Introduction with Emphasis on Triangulated and Interval Graphs.” Annals of Discrete Mathematics, 21, 67–96.

    Google Scholar 

  • Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. (1998). Biological Sequence Analysis. Cambridge University Press, UK.

    Google Scholar 

  • Gallant, J., D. Maier, and J.A. Storer. (1980). “On Finding Minimal Length Superstrings.” Journal of Computer and System Sciences, 20, 50–58.

    Article  Google Scholar 

  • Garey, M. and D. Johnson. (1979). Computers and Intractibility: A Guide to the Theory of ℕℙ-Completeness. W.H. Freeman, San Francisco.

    Google Scholar 

  • Grötschel, M., L. Lovász, and A. Schrijver. (1988). Geometric Algorithms and Combinatorial Optimization. Springer-Verlag, New York.

    Google Scholar 

  • Grötschel, M., L. Lovász, and A. Schrijver. (1984). “Polynomial Algorithms for Perfect Graphs.” Annals of Discrete Mathematics, 325–356.

  • Golumbic, M.C., D. Rotem, and J. Urrutia. (1983). “Comparability Graphs and Intersection Graphs.” Discrete Mathematics, 43, 37–46.

    Article  Google Scholar 

  • Hayward, R.B. (1985). “Weakly Triangulated Graphs.” Journal of Combinatorial Theory Series B, 39, 200–209.

    Article  Google Scholar 

  • Idury R.M. and M.S. Waterman. (1995). “A New Algorithm for DNA Sequence Assembly.” Journal of Computational Biology, 2(2), 291–306.

    Article  Google Scholar 

  • Jiang, T. and M. Li. (1995). “On the Approximation of Shortest Common Supersequences and Longest Common Subsequences.” SIAM J. Comput, 24(5), 1122–1139.

    Article  Google Scholar 

  • Kececioglu, J.D., H. Lenhof, K. Mehlhorn, P. Mutzel, K. Reinert, and M. Vingron. (2000). “A Polyhedral Approach to Sequence Alignment Problems.” Discrete Applied Mathematics, 104, 143–186.

    Article  Google Scholar 

  • Levenshtein, V.L.(1966). “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals.” Cybernetics Control Theory, 10(9), 707–710.

    Google Scholar 

  • Lipman, D.J., S.F. Altschul, and J.D. Kececioglu. (1989). “ A Tool for Multiple Sequence Alignment.” Proc Natl Acad Sci USA, 86(12), 4412–4415.

    Google Scholar 

  • Lu, M. and H. Lin. (1994) “Parallel Algorithms for the Longest Common Subsequence Problem.” IEEE Transaction on Parallel and Distri. Sys., 5(8), 835–847.

    Article  Google Scholar 

  • Maier, D. (1977). “The Complexity of Some Problems on Subsequences and Supersequences.” J. Assoc. Comput. Mach., 25, 322–336.

    Google Scholar 

  • Maier, D. and J.A. Storer. (1977). “A Note on the Complexity of the Superstring Problem.” Technical Report Report No. 233, Princeton University

  • Myoupo, J.F. and D. Seme. (1999). “Time-Efficient Parallel Algorithms for the Longest Common Subsequence and Related Problems.” Journal of Parallel and Distributed Computing, 57, 212–223.

    Article  Google Scholar 

  • Notredame, C. (2001). “Recent Progress in Multiple Sequence Alignment: A Survey.” Pharmacogenomics, 3(1).

  • Sassano, A. (1997) “Chair-Free Berge Graphs are Perfect.” Graphs and Combinatorics, 13, 369–395.

    Google Scholar 

  • Schierup, M.H. and J. Hein. (2000). “Consequences of Recombination on Traditional Phylogenetic Analysis.” Genetics, 156(2), 879–891.

    Google Scholar 

  • Sellers, P.H. (1974). “On the Theory and Computation of Evolutionary Distances.” SIAM Journal on Applied Mathematics, 26(4), 787–793.

    Article  Google Scholar 

  • Shyu, S.J., Y.T. Tsai, and R.C.T. Lee. (2004). “The Minimal Spanning Tree Preservation Approaches for DNA Multiple Sequence Alignment and Evolutionary Tree Construction.” Journal of Combinatorial Optimization, 8(4), 453–468.

    Article  Google Scholar 

  • Tajima, F. and N. Takezaki. (1994). “Estimation of Evolutionary Distance for Reconstructing Molecular Phylogenetic Trees.” Molecular Biology and Evolution, 11, 278–286.

    Google Scholar 

  • Teng, S. and F. Yao. (1993) “Approximating Shortest Supersequences.” In Proc. of 34th Ann. IEEE Symp. on Foundations of Comp. Sci., IEEE Computer Society, pp. 158–165.

  • Thompson, J.D., D.G. Higgins, and T.J. Gibson. (1994). “CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice.” Nucleic Acids Res., 22(22), 4673–4680.

    Article  Google Scholar 

  • Wagner, R.A. and M.J. Fischer. (1974). “The Sequence-to-Sequence Correction Problem.” J. Assoc. Comput. Mach., 21, 168–173.

    Google Scholar 

  • Waterman M.S. (1995). Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, UK.

    Google Scholar 

  • Zhang, Y. and M.S. Waterman. (2003). “An Eulerian Path approach to Global Multiple Alignment for DNA Sequences.” Journal of Computational Biology, 10(6), 803–819.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eva K. Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, E.K., Easton, T. & Gupta, K. Novel evolutionary models and applications to sequence alignment problems. Ann Oper Res 148, 167–187 (2006). https://doi.org/10.1007/s10479-006-0085-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-006-0085-9

Keywords