Novel evolutionary models and applications to sequence alignment problems

Lee, Eva K.; Easton, Todd; Gupta, Kapil

doi:10.1007/s10479-006-0085-9

Novel evolutionary models and applications to sequence alignment problems

Published: 29 September 2006

Volume 148, pages 167–187, (2006)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Eva K. Lee^1,2,
Todd Easton¹ &
Kapil Gupta¹

111 Accesses
Explore all metrics

Abstract

In this paper, we present a novel graph-theoretical approach for representing a wide variety of sequence analysis problems within a single model. The model allows incorporation of the operations “insertion”, “deletion”, and “substitution”, and various parameters such as relative distances and weights. Conceptually, we refer the problem as the minimum weight common mutated sequence (MWCMS) problem. The MWCMS model has many applications including multiple sequence alignment problem, the phylogenetic analysis, the DNA sequencing problem, and sequence comparison problem, which encompass a core set of very difficult problems in computational biology. Thus the model presented in this paper lays out a mathematical modeling framework that allows one to investigate theoretical and computational issues, and to forge new advances for these distinct, but related problems.

Through the introduction of supernodes, and the multi-layer supergraph, we proved that MWCMS is ${NP}$-complete. Furthermore, it was shown that a conflict graph derived from the multi-layer supergraph has the property that a solution to the associated node-packing problem of the conflict graph corresponds to a solution of the MWCMS problem. In this case, we proved that when the number of input sequences is a constant, MWCMS is polynomial-time solvable. We also demonstrated that some well-known combinatorial problems can be viewed as special cases of the MWCMS problem. In particular, we presented theoretical results implied by the MWCMS theory for the minimum weight supersequence problem, the minimum weight superstring problem, and the longest common subsequence problem.

Two integer programming formulations were presented and a simple yet elegant decomposition heuristic was introduced. The integer programming instances have proven to be computationally intensive. Consequently, research involving simultaneous column and row generation and parallel computing will be explored. The heuristic algorithm, introduced herein for multiple sequence alignment, overcomes the order-dependent drawbacks of many of the existing algorithms, and is capable of returning good sequence alignments within reasonable computational time. It is able to return the optimal alignment for multiple sequences of length less than 1500 base pairs within 30 minutes. Its algorithmic decomposition nature lends itself naturally for parallel distributed computing, and we continue to explore its flexibility and scalability in a massive parallel environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Babel, L. (1991). “Finding Maximum Cliques in Arbitrary and in Special Graphs.” Computing, 46(4), 321–341.
Article Google Scholar
Baeza-Yates, R.A. and C.H. Perleberg. (1992). “Fast and Practical Approximate String Matching.” In Proceeings of the 3rd Annual Symposium on Combinatorial Pattern Matching.
Bains, W. and G.C. Smith. (1988). “A Novel Nethod for DNA Sequence Determination.” Journal of Theoretical Biology, 135, 303–307.
Article Google Scholar
Bellare, M. and M. Sudan. (1994). “Improved Non-Approximability Results.” In Proc. 26th ACM Symp. on Theory of Computing, pp. 184–193.
Berge, C. (1961). “Färbung Von Graphen Deren Sämtliche bzw, Ungerade Kreise Starr Sind.” Wiss. Z. Matin-Luther-Univ. Halle-Wittenberg, 114.
Boppana, R. and M.M. Haldorsson. (1992). “Approximating Maximum Independent Set by Excluding Subgraphs.” BIT, 32, 130–196.
Article Google Scholar
Chenna, R., H. Sugawara, T. Koike, T.J. Gibson, D.G. Higgins, and J.D. Thompson. (2003). “Multiple Sequence Alignment with the Clustal Series of Programs.” Nucleic Acids Research, 31(13), 3497–3500.
Article Google Scholar
Chvátal, V. (1985). “Star-Cutsets and Perfect Graphs.” Journal of Combinatorial Theory Series B, 39, 189–199.
Article Google Scholar
Chvátal, V. and D. Sankoff. (1975). “Longest Common Subsequences of two Random Sequences.” Journal of Applied Probability, 12, 306–315.
Article Google Scholar
Duchet, P. (1984). “Classical Perfect Graphs, An Introduction with Emphasis on Triangulated and Interval Graphs.” Annals of Discrete Mathematics, 21, 67–96.
Google Scholar
Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. (1998). Biological Sequence Analysis. Cambridge University Press, UK.
Google Scholar
Gallant, J., D. Maier, and J.A. Storer. (1980). “On Finding Minimal Length Superstrings.” Journal of Computer and System Sciences, 20, 50–58.
Article Google Scholar
Garey, M. and D. Johnson. (1979). Computers and Intractibility: A Guide to the Theory of ℕℙ-Completeness. W.H. Freeman, San Francisco.
Google Scholar
Grötschel, M., L. Lovász, and A. Schrijver. (1988). Geometric Algorithms and Combinatorial Optimization. Springer-Verlag, New York.
Google Scholar
Grötschel, M., L. Lovász, and A. Schrijver. (1984). “Polynomial Algorithms for Perfect Graphs.” Annals of Discrete Mathematics, 325–356.
Golumbic, M.C., D. Rotem, and J. Urrutia. (1983). “Comparability Graphs and Intersection Graphs.” Discrete Mathematics, 43, 37–46.
Article Google Scholar
Hayward, R.B. (1985). “Weakly Triangulated Graphs.” Journal of Combinatorial Theory Series B, 39, 200–209.
Article Google Scholar
Idury R.M. and M.S. Waterman. (1995). “A New Algorithm for DNA Sequence Assembly.” Journal of Computational Biology, 2(2), 291–306.
Article Google Scholar
Jiang, T. and M. Li. (1995). “On the Approximation of Shortest Common Supersequences and Longest Common Subsequences.” SIAM J. Comput, 24(5), 1122–1139.
Article Google Scholar
Kececioglu, J.D., H. Lenhof, K. Mehlhorn, P. Mutzel, K. Reinert, and M. Vingron. (2000). “A Polyhedral Approach to Sequence Alignment Problems.” Discrete Applied Mathematics, 104, 143–186.
Article Google Scholar
Levenshtein, V.L.(1966). “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals.” Cybernetics Control Theory, 10(9), 707–710.
Google Scholar
Lipman, D.J., S.F. Altschul, and J.D. Kececioglu. (1989). “ A Tool for Multiple Sequence Alignment.” Proc Natl Acad Sci USA, 86(12), 4412–4415.
Google Scholar
Lu, M. and H. Lin. (1994) “Parallel Algorithms for the Longest Common Subsequence Problem.” IEEE Transaction on Parallel and Distri. Sys., 5(8), 835–847.
Article Google Scholar
Maier, D. (1977). “The Complexity of Some Problems on Subsequences and Supersequences.” J. Assoc. Comput. Mach., 25, 322–336.
Google Scholar
Maier, D. and J.A. Storer. (1977). “A Note on the Complexity of the Superstring Problem.” Technical Report Report No. 233, Princeton University
Myoupo, J.F. and D. Seme. (1999). “Time-Efficient Parallel Algorithms for the Longest Common Subsequence and Related Problems.” Journal of Parallel and Distributed Computing, 57, 212–223.
Article Google Scholar
Notredame, C. (2001). “Recent Progress in Multiple Sequence Alignment: A Survey.” Pharmacogenomics, 3(1).
Sassano, A. (1997) “Chair-Free Berge Graphs are Perfect.” Graphs and Combinatorics, 13, 369–395.
Google Scholar
Schierup, M.H. and J. Hein. (2000). “Consequences of Recombination on Traditional Phylogenetic Analysis.” Genetics, 156(2), 879–891.
Google Scholar
Sellers, P.H. (1974). “On the Theory and Computation of Evolutionary Distances.” SIAM Journal on Applied Mathematics, 26(4), 787–793.
Article Google Scholar
Shyu, S.J., Y.T. Tsai, and R.C.T. Lee. (2004). “The Minimal Spanning Tree Preservation Approaches for DNA Multiple Sequence Alignment and Evolutionary Tree Construction.” Journal of Combinatorial Optimization, 8(4), 453–468.
Article Google Scholar
Tajima, F. and N. Takezaki. (1994). “Estimation of Evolutionary Distance for Reconstructing Molecular Phylogenetic Trees.” Molecular Biology and Evolution, 11, 278–286.
Google Scholar
Teng, S. and F. Yao. (1993) “Approximating Shortest Supersequences.” In Proc. of 34th Ann. IEEE Symp. on Foundations of Comp. Sci., IEEE Computer Society, pp. 158–165.
Thompson, J.D., D.G. Higgins, and T.J. Gibson. (1994). “CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice.” Nucleic Acids Res., 22(22), 4673–4680.
Article Google Scholar
Wagner, R.A. and M.J. Fischer. (1974). “The Sequence-to-Sequence Correction Problem.” J. Assoc. Comput. Mach., 21, 168–173.
Google Scholar
Waterman M.S. (1995). Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, UK.
Google Scholar
Zhang, Y. and M.S. Waterman. (2003). “An Eulerian Path approach to Global Multiple Alignment for DNA Sequences.” Journal of Computational Biology, 10(6), 803–819.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Operations Research in Medicine, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia
Eva K. Lee, Todd Easton & Kapil Gupta
Winship Cancer Institute, Emory University School of Medicine, Atlanta, Georgia
Eva K. Lee

Authors

Eva K. Lee
View author publications
You can also search for this author inPubMed Google Scholar
Todd Easton
View author publications
You can also search for this author inPubMed Google Scholar
Kapil Gupta
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Eva K. Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, E.K., Easton, T. & Gupta, K. Novel evolutionary models and applications to sequence alignment problems. Ann Oper Res 148, 167–187 (2006). https://doi.org/10.1007/s10479-006-0085-9

Download citation

Published: 29 September 2006
Issue Date: November 2006
DOI: https://doi.org/10.1007/s10479-006-0085-9

Keywords

Part of a collection:

Special Section: OR in Medicine and Health Care

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel evolutionary models and applications to sequence alignment problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Developing new genetic algorithm based on integer programming for multiple sequence alignment

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

A Multi-objective Optimization Framework for Multiple Sequence Alignment with Metaheuristics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Novel evolutionary models and applications to sequence alignment problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Developing new genetic algorithm based on integer programming for multiple sequence alignment

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

A Multi-objective Optimization Framework for Multiple Sequence Alignment with Metaheuristics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now