Abstract
Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing the most likely scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, that we called the Indel Maximum Likelihood Problem (IMLP), is an important step toward the reconstruction of ancestral genomics sequences, and is important for studying evolutionary processes and genome function. We solve the IMLP using a new type of tree hidden Markov model whose states correspond to single-based evolutionary scenarios and transitions model dependencies between neighboring columns. The standard Viterbi and Forward-backward algorithms are optimized to produce the most likely ancestral reconstruction and to compute the level of confidence associated to specific regions of the reconstruction. The method is illustrated on a set of 85kb sequences from eight mammals.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blanchette, M., Green, E.D., Miller, W., Haussler, D.: Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res. 14(12), 2412–2423 (2004)
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research 14(4), 708–715 (2004)
Bray, N., Pachter, L.: MAVID: constrained ancestral alignment of multiple sequences. Genome Research 14(4), 693–699 (2004)
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Research 13(4), 721–731 (2003)
Chindelevitch, L., Li, Z., Blais, E., Blanchette, M.: On the inference of parsimonious indel evolutionary scenarios. Journal of Bioinformatics and Computational Biology (in press, 2006)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17, 368–376 (1981)
Felsenstein, J., Churchill, G.: A hidden markov model approach to variation among sites in rate of evolution. Mol. Biol. Evol. 13, 93–104 (1996)
Fredslund, J., Hein, J., Scharling, T.: A large version of the small parsimony problem. In: Proceedings of the 4th Workshop on Algorithms in Bioinformatics (WABI) (2004)
Hein, J.: A method that simultaneously aligns, finds the phylogeny and reconstructs ancestral sequences for any number of ancestral sequences. Molecular Biology and Evolution 6(6), 649–668 (1989)
Hudek, A., Brown, D.G.: Ancestral sequence alignment under optimal conditions. BMC Bioinformatics 6(273), 1–14 (2005)
Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., Haussler, D.: Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. USA 100(20), 11484–11489 (2003)
Lunter, G.A., Miklos, I., Song, Y.S., Hein, J.: An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J. Computational Biology 10(6), 869–889 (2003)
Miller, W.: Personal communication
Rivas, E.: Evolutionary models for insertions and deletions in a probabilistic modeling framework. BMC Bioinformatics 6(1), 63 (2005)
Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R.K., Gibbs, R.A., Kent, W.J., Miller, W., Haussler, D.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15(8), 1034–1050 (2005)
Siepel, A., Haussler, D.: Combining phylogenetic and hidden markov models in biosequence analysis. J. Comput Biology 11(2-3), 413–428 (2004)
Thorne, J.L., Kishino, H., Felsenstein, J.: Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34, 3–16 (1992)
Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33(2), 114–124 (1991)
Yang, Z.: Among-site rate variation and its impact on phylogenetic analysis (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Diallo, A.B., Makarenkov, V., Blanchette, M. (2006). Finding Maximum Likelihood Indel Scenarios. In: Bourque, G., El-Mabrouk, N. (eds) Comparative Genomics. RCG 2006. Lecture Notes in Computer Science(), vol 4205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11864127_14
Download citation
DOI: https://doi.org/10.1007/11864127_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44529-6
Online ISBN: 978-3-540-44530-2
eBook Packages: Computer ScienceComputer Science (R0)