Abstract
A central problem in computational biology is the inference of phylogeny given a set of DNA or protein sequences. Currently, this problem is tackled stepwise, with phylogenetic reconstruction dependent on an initial multiple sequence alignment step. However these two steps are fundamentally interdependent. Whether the main interest is in sequence alignment or phylogeny, a major goal of computational biology is the co-estimation of both. Here we present a first step towards this goal by developing an extension of the Felsenstein peeling algorithm. Given an alignment, our extension analytically integrates out both substitution and insertion–deletion events within a proper statistical model. This new algorithm provides a solution to two important problems in computational biology. Firstly, indel events become informative for phylogenetic reconstruction, and secondly phylogenetic uncertainty can be included in the estimation of insertion-deletion parameters. We illustrate the practicality of this algorithm within a Bayesian Markov chain Monte Carlo framework by demonstrating it on a non-trivial analysis of a multiple alignment of ten globin protein sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Drummond, A.J., Nicholls, G.K., Rodrigo, A.G., Solomon, W.: Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161(3), 1307–1320 (2002)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
Eddy, S.: HMMER: Profile hidden Markov models for biological sequence analysis (2001), http://hmmer.wustl.edu/
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981)
Felsenstein, J.: Estimating effective population size from samples of sequences: Inefficiency of pairwise and segregating sites as compared to phylogenetic estimates. Genetical Research Cambridge 59, 139–147 (1992)
Felsenstein, J.: PHYLIP version 3.5c. Dept. of Genetics, Univ. of Washington, Seattle (1993)
Griffiths, R.C., Tavare, S.: Ancestral inference in population genetics. Statistical Science 9, 307–319 (1994)
Hedges, S.B., Poling, L.L.: A molecular phylogeny of reptiles. Science 283(5404), 945–946 (1999)
Hein, J.: An algorithm for statistical alignment of sequences related by a binary tree. In: Pac. Symp. Biocomp., pp. 179–190. World Scientific, Singapore (2001)
Hein, J., Jensen, J.L., Pedersen, C.N.S.: Recursions for statistical multiple alignment. Technical Report 425, Dept. of Theor. Stat., Univ. of Aarhus (January 2002)
Hein, J., Wiuf, C., Knudsen, B., Møller, M.B., Wibling, G.: Statistical alignment: Computational properties, homology testing and goodness-of-fit. J. Mol. Biol. 302, 265–279 (2000)
Holmes, I., Bruno, W.J.: Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17(9), 803–820 (2001)
Huelsenbeck, J.P., Ronquist, F.: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics (2001)
Jensen, J.L., Hein, J.: Gibbs sampler for statistical multiple alignment. Technical Report 429, Dept. of Theor. Stat., U. Aarhus (September 2002)
Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro (ed.) Mammalian Protein Metabolism, pp. 21–132. Acad. Press, New York (1969)
Kuhner, M.K., Yamato, J., Felsenstein, J.: Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140(4), 1421–1430 (1995)
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Heidelberg (2001)
Lunter, G.A., Miklós, I., Song, Y.S., Hein, J.: An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J. Comp. Biol. (2003) (in press)
Miklós, I.: An improved algorithm for statistical alignment of sequences related by a star tree. Bul. Math. Biol. 64, 771–779 (2002)
Miklós, I., Lunter, G.A., Holmes, I.: A ”long indel” model for evolutionary sequence alignment (in preparation)
Pybus, O.G., Drummond, A.J., Nakano, T., Robertson, B.H., Rambaut, A.: The epidemiology and iatrogenic transmission of hepatitis c virus in Egypt: a Bayesian coalescent approach. Mol Biol Evol 20(3), 381–387 (2003)
Pybus, O.G., Rambaut, A., Harvey, P.H.: An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155(3), 1429–1437 (2000)
Steel, M., Hein, J.: Applying the Thorne-Kishino-Felsenstein model to sequence evolution on a star-shaped tree. Appl. Math. Let. 14, 679–684 (2001)
Stephens, M., Donnelly, P.: Inference in molecular population genetics. J. of the Royal Stat. Soc. B 62, 605–655 (2000)
Swofford, D.: Paup* 4.0. Sinauer Associates (2001)
Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114–124 (1991)
Whelan, S., Lió, P., Goldman, N.: Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends in Gen. 17, 262–272 (2001)
Wilson, J., Balding, D.J.: Genealogical inference from microsatellite data. Genetics (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lunter, G., Miklós, I., Drummond, A., Jensen, J.L., Hein, J. (2003). Bayesian Phylogenetic Inference under a Statistical Insertion-Deletion Model. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-39763-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20076-5
Online ISBN: 978-3-540-39763-2
eBook Packages: Springer Book Archive