A robust model for finding optimal evolutionary trees

Farach, M.; Kannan, S.; Warnow, T.

doi:10.1007/BF01188585

A robust model for finding optimal evolutionary trees

Published: February 1995

Volume 13, pages 155–179, (1995)
Cite this article

Algorithmica Aims and scope Submit manuscript

M. Farach¹,
S. Kannan² &
T. Warnow³

377 Accesses
96 Citations
Explore all metrics

Abstract

Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species, and seeks to find an edge-weighted treeT in which the distanced ^T_ij in the tree between the leaves ofT corresponding to the speciesi andj exactly equals the observed distance,d _ij. When such a tree exists, this is expressed in the biological literature by saying that the distance function or matrix isadditive, and trees can be constructed from additive distance matrices in0(n ²) time. Real distance data is hardly ever additive, and we therefore need ways of modeling the problem of finding the best-fit tree as an optimization problem.

In this paper we present several natural and realistic ways of modeling the inaccuracies in the distance data. In one model we assume that we have upper and lower bounds for the distances between pairs of species and try to find an additive distance matrix between these bounds. In a second model we are given a partial matrix and asked to find if we can fill in the unspecified entries in order to make the entire matrix additive. For both of these models we also consider a more restrictive problem of finding a matrix that fits a tree which is not only additive but alsoultrametric. Ultrametric matrices correspond to trees which can be rooted so that the distance from the root to any leaf is the same. Ultrametric matrices are desirable in biology since the edge weights then indicate evolutionary time. We give polynomial-time algorithms for some of the problems while showing others to be NP-complete. We also consider various ways of “fitting” a given distance matrix (or a pair of upper- and lower-bound matrices) to a tree in order to minimize various criteria of error in the fit. For most criteria this optimization problem turns out to be NP-hard, while we do get polynomial-time algorithms for some.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

A large-scale evaluation of algorithms to calculate average nucleotide identity

Article 15 February 2017

Seok-Hwan Yoon, Sung-min Ha, … Jongsik Chun

A Primer on Phylogenetic Generalised Least Squares

References

A. V. Aho, Y. Sagiv, T. G. Szymanski, and J. D. Ullman, Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions,SIAM J. Comput., 10(3):405–421, 1981.
Article MATH MathSciNet Google Scholar
S. F. Altschul, Amino acid substitution matrices from an information theoretic perspective,J. Mol. Biol., 219:555–565, 1991.
Article Google Scholar
W. Beyer, M. Stein, T. Smith, and S. Ulam, A molecular sequence metric and evolutionary trees,Math. Biosci., 19:9–25, 1974.
Article MATH Google Scholar
H. Bodlaender, M. Fellows, and T. Warnow, Two strikes against perfect phylogeny,Proceedings, International Congress on Automata and Language Processing (ICALP), Vienna, July 1992.
P. Buneman,Mathematics in the Archeological and Historical Sciences F. R. Hobson, D. G. Kendall, and P. Tautu, eds., University Press, Edinburgh, p. 387.
L. Cavalli-Sforza and A. Edwards, Phylogenetic analysis models and estimation procedures,Evolution, 32:550–570, 1967.
Article Google Scholar
T. Cormen, C. Leiserson, and R. Rivest,Introduction to Algorithms, MIT Press, Cambridge, MA, 1990.
Google Scholar
J. Culberson and P. Rudnicki, A fast algorithm for constructing trees from distance matrices,Inform. Process. Lett., 30:215–220, 1989.
Article MATH MathSciNet Google Scholar
W. H. E. Day, Computational complexity of inferring phylogenies from dissimilarity matrices,Bull, of Math. Biol., 49(4):461–467, 1987.
MATH Google Scholar
M. Dayhoff and R. Eck,Atlas of Protein Sequence and Structure 1967–68, National Biomedical Research Foundation, Silver Spring, MD.
J. S. Farris, Estimating phylogenetic trees from distance matrices,Amer. Natur., 106:645–668, 1972.
Article Google Scholar
J. Felsenstein, Phylogenies from molecular sequences: inference and reliability,Annual Rev. Genet., 22:521–565, 1988.
Article Google Scholar
W. M. Fitch and E. Margoliash, The construction of phylogenetic trees,Science, 155:29–94, 1976.
Google Scholar
M. R. Garey and D. S. Johnson,Computers and Intractability, Freeman, New York, 1979.
MATH Google Scholar
G. H. Gonnet, M. A. Cohen, and S. A. Benner, Exhaustive matching of the entire protein sequence database,Science, 256:1443–1445, 1992.
Article Google Scholar
D. Harel and R. Tarjan, Fast algorithm for finding nearest common ancestors,SIAM J. Comput., 13(2):338–355, 1984.
Article MATH MathSciNet Google Scholar
C. J. Jardine, N. Jardine, and R. Sibson, The structure and construction of taxonomic hierarchies, Math. Biosci., 1:173–179, 1967.
Article MATH Google Scholar
S. Kannan, E. Lawler, and T. Warnow, Determining the evolutionary tree,Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 475–484, San Francisco, CA, Jan. 1990.
M. Křivanek, The complexity of ultrametric partitions on graphs,Inform. Process. Lett., 27(5):265–270, 1988.
Article MATH MathSciNet Google Scholar
W.-H. Li, Simple method for constructing phylogenetic trees from distance matrices,Proc. Nat. Acad. Sci. USA, 78:1085–89.
C. Lund and M. Yannakakis, On the hardness of approximating minimization problems, Manuscript.
W. Miller and E. W. Myers, Sequence comparison with concave weighting functions,Bull. Math. Biol., 50(2):97–120, 1988.
MATH MathSciNet Google Scholar
N. Saitou and M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees,Mol. Biol Evol., 4:406–25, 1987.
Google Scholar
R. Sokal and P. Sneath,Numerical Taxonomy, Freeman, San Francisco, CA, 1963.
Google Scholar
M. A. Steel, The complexity of reconstructing trees from qualitative characters and subtrees,J. Classification, 9:91–116, 1992.
Article MATH MathSciNet Google Scholar
E. Sweedyk and T. Warnow, The optimal tree alignment problem is hard, Manuscript.
R. E. Tarjan, Sensitivity analysis of minimum spanning trees and shortest path trees,Inform. Process. Lett., 14(1):30–33, 1982.
Article MathSciNet Google Scholar
Y. Tateno, M. Nei, and F. Tajima, Accuracy of estimated phylogenetic trees from molecular data. I: Distantly related trees,J. Mol. Evol., 18:387–404.
M. S. Waterman, T. F. Smith, M. Singh, and W. A. Beyer, Additive evolutionary trees,J. Theoret. Biol., 64:199–213, 1977.
Article MathSciNet Google Scholar
W. J. Wilbur, On the PAM matrix model of protein evolution,Mol Biol. Evol, 2:434–447, 1985.
Google Scholar

Download references

Author information

Authors and Affiliations

DIMACS, Rutgers University, Box 1179, 08855, Piscataway, NJ, USA
M. Farach
Department of Computer and Information Science, University of Pennsylvania, 19104, Philadelphia, PA, USA
S. Kannan
Department of Computer and Information Science, University of Pennsylvania, 19104, Philadelphia, PA, USA
T. Warnow

Authors

M. Farach
View author publications
You can also search for this author in PubMed Google Scholar
S. Kannan
View author publications
You can also search for this author in PubMed Google Scholar
T. Warnow
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by E. W. Myers.

Supported by DIMACS under NSF Contract STC-88-09648.

Supported by NSF Grant CCR-9108969.

This work was begun while this author was visiting DIMACS in July and August 1992, and was supported in part by the U.S. Department of Energy under Contract DE-AC04-76DP00789.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farach, M., Kannan, S. & Warnow, T. A robust model for finding optimal evolutionary trees. Algorithmica 13, 155–179 (1995). https://doi.org/10.1007/BF01188585

Download citation

Received: 15 October 1992
Revised: 15 February 1993
Issue Date: February 1995
DOI: https://doi.org/10.1007/BF01188585

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust model for finding optimal evolutionary trees

Abstract

Access this article

Similar content being viewed by others

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

A large-scale evaluation of algorithms to calculate average nucleotide identity

A Primer on Phylogenetic Generalised Least Squares

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Abstract

Access this article

Similar content being viewed by others

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

A large-scale evaluation of algorithms to calculate average nucleotide identity

A Primer on Phylogenetic Generalised Least Squares

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation