Skip to main content

Motif Yggdrasil: Sampling from a Tree Mixture Model

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3909))

  • 1294 Accesses

Abstract

In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. The use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  2. Liu, J., Neuwald, A., Lawrence, C.: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association 90, 1156–1170 (1995)

    Article  MATH  Google Scholar 

  3. Eskin, E., Pevzner, P.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl. 1), S354–363 (2002)

    Google Scholar 

  4. Eskin, E.: From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles. In: Proceedings of the eigth International Conference on Computational Molecular Biology (RECOMB 2004), pp. 115–124. ACM Press, New York (2004)

    Google Scholar 

  5. Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002)

    Article  Google Scholar 

  6. Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9, 225–242 (2002)

    Article  Google Scholar 

  7. Blanchette, M., Schwikowski, B., Tompa, M.: An exact algorithm to identify motifs in orthologous sequences from multiple species. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 37–45 (2000)

    Google Scholar 

  8. Blanchette, M.: Algorithms for phylogenetic footprinting. In: Proceedings of the Fifth International Conference on Computational Molecular Biology (RECOMB 2001), pp. 49–58. ACM Press, New York (2001)

    Google Scholar 

  9. Moses, A., Chiang, D., Eisen, M.: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures, 324–335 (2004)

    Google Scholar 

  10. Siddhartan, R., van Nimwegen, E., Siggia, E.D.: PhyloGibbs: A Gibbs sampler incorporating phylogenetic information. In: Eskin, E., Workman, C. (eds.) RECOMB 2004 Satellite Workshop on Regulatory Genomics, pp. 30–41 (2005)

    Google Scholar 

  11. Li, X., Wong, W.: Sampling motifs on phylogenetic trees. Proc. Natl. Acad. Sci. USA 102, 9481–9486 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  12. Wray, G.A., Hahn, M.W., Abouheif, E., Balhoff, J.P., Pizer, M., Rockman, M.V., Romano, L.A.: The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20, 1377–1419 (2003)

    Article  Google Scholar 

  13. Moses, A.M., Chiang, D.Y., Kellis, M., Lander, E.S., Eisen, M.B.: Position specific variation in the rate of evolution in transcription factor binding sites. BMC Evol. Biol. 3, 19 (2003)

    Article  Google Scholar 

  14. Liu, X., Brutlag, D., Liu, J.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes, 127–138 (2001)

    Google Scholar 

  15. Liu, J.: The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89 (1994)

    Google Scholar 

  16. Jensen, S.T., Liu, J.S.: Biooptimizer: a bayesian scoring function approach to motif discovery. Bioinformatics 20, 1557–1564 (2004)

    Article  Google Scholar 

  17. Vavouri, T., Elgar, G.: Prediction of cis-regulatory elements using binding site matrices–the successes, the failures and the reasons for both. Curr. Opin. Genet. Dev. 15, 395–402 (2005)

    Article  Google Scholar 

  18. Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, W., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)

    Article  Google Scholar 

  19. Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice. Chapman and Hall, Boca Raton (1996)

    MATH  Google Scholar 

  20. Liu, J.S.: Monte Carlo strategies in Scientific Computing. Springer, New York (2003)

    Google Scholar 

  21. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probablistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  22. Rambaut, A., Grassly, N.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997)

    Google Scholar 

  23. Mazon, G., Erill, I., Campoy, S., Cortes, P., Forano, E., Barbe, J.: Reconstruction of the evolutionary history of the LexA-binding sequence. Microbiology 150, 3783–3795 (2004)

    Article  Google Scholar 

  24. Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., Urbach, S.: The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29, 281–283 (2001)

    Article  Google Scholar 

  25. Erill, I., Jara, M., Salvador, N., Escribano, M., Campoy, S., Barbe, J.: Differences in LexA regulon structure among Proteobacteria through in vivo assisted comparative genomics. Nucleic Acids Res. 32, 6617–6626 (2004)

    Article  Google Scholar 

  26. Guindon, S., Gascuel, O.: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003)

    Article  Google Scholar 

  27. Hannenhalli, S., Wang, L.: Enhanced position weight matrices using mixture models. Bioinformatics 21(Suppl. 1), i204–i212 (2005)

    Google Scholar 

  28. Huson, D.: Splitstree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73 (1998)

    Article  Google Scholar 

  29. Bryant, D., Moulton, V.: Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. 21, 255–265 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Andersson, S.A., Lagergren, J. (2006). Motif Yggdrasil: Sampling from a Tree Mixture Model. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_39

Download citation

  • DOI: https://doi.org/10.1007/11732990_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33295-4

  • Online ISBN: 978-3-540-33296-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics