Abstract
In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. The use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Liu, J., Neuwald, A., Lawrence, C.: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association 90, 1156–1170 (1995)
Eskin, E., Pevzner, P.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl. 1), S354–363 (2002)
Eskin, E.: From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles. In: Proceedings of the eigth International Conference on Computational Molecular Biology (RECOMB 2004), pp. 115–124. ACM Press, New York (2004)
Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002)
Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9, 225–242 (2002)
Blanchette, M., Schwikowski, B., Tompa, M.: An exact algorithm to identify motifs in orthologous sequences from multiple species. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 37–45 (2000)
Blanchette, M.: Algorithms for phylogenetic footprinting. In: Proceedings of the Fifth International Conference on Computational Molecular Biology (RECOMB 2001), pp. 49–58. ACM Press, New York (2001)
Moses, A., Chiang, D., Eisen, M.: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures, 324–335 (2004)
Siddhartan, R., van Nimwegen, E., Siggia, E.D.: PhyloGibbs: A Gibbs sampler incorporating phylogenetic information. In: Eskin, E., Workman, C. (eds.) RECOMB 2004 Satellite Workshop on Regulatory Genomics, pp. 30–41 (2005)
Li, X., Wong, W.: Sampling motifs on phylogenetic trees. Proc. Natl. Acad. Sci. USA 102, 9481–9486 (2005)
Wray, G.A., Hahn, M.W., Abouheif, E., Balhoff, J.P., Pizer, M., Rockman, M.V., Romano, L.A.: The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20, 1377–1419 (2003)
Moses, A.M., Chiang, D.Y., Kellis, M., Lander, E.S., Eisen, M.B.: Position specific variation in the rate of evolution in transcription factor binding sites. BMC Evol. Biol. 3, 19 (2003)
Liu, X., Brutlag, D., Liu, J.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes, 127–138 (2001)
Liu, J.: The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89 (1994)
Jensen, S.T., Liu, J.S.: Biooptimizer: a bayesian scoring function approach to motif discovery. Bioinformatics 20, 1557–1564 (2004)
Vavouri, T., Elgar, G.: Prediction of cis-regulatory elements using binding site matrices–the successes, the failures and the reasons for both. Curr. Opin. Genet. Dev. 15, 395–402 (2005)
Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, W., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice. Chapman and Hall, Boca Raton (1996)
Liu, J.S.: Monte Carlo strategies in Scientific Computing. Springer, New York (2003)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probablistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Rambaut, A., Grassly, N.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997)
Mazon, G., Erill, I., Campoy, S., Cortes, P., Forano, E., Barbe, J.: Reconstruction of the evolutionary history of the LexA-binding sequence. Microbiology 150, 3783–3795 (2004)
Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., Urbach, S.: The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29, 281–283 (2001)
Erill, I., Jara, M., Salvador, N., Escribano, M., Campoy, S., Barbe, J.: Differences in LexA regulon structure among Proteobacteria through in vivo assisted comparative genomics. Nucleic Acids Res. 32, 6617–6626 (2004)
Guindon, S., Gascuel, O.: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003)
Hannenhalli, S., Wang, L.: Enhanced position weight matrices using mixture models. Bioinformatics 21(Suppl. 1), i204–i212 (2005)
Huson, D.: Splitstree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73 (1998)
Bryant, D., Moulton, V.: Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. 21, 255–265 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Andersson, S.A., Lagergren, J. (2006). Motif Yggdrasil: Sampling from a Tree Mixture Model. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_39
Download citation
DOI: https://doi.org/10.1007/11732990_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33295-4
Online ISBN: 978-3-540-33296-1
eBook Packages: Computer ScienceComputer Science (R0)