Skip to main content

Nucleosome Occupancy Information Improves de novo Motif Discovery

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4453))

Abstract

A complete understanding of transcriptional regulatory processes in the cell requires identification of transcription factor binding sites on a genome-wide scale. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known transcription factor binding sites occur in the genome than are actually functional. Chromatin structure is known to play an important role in guiding transcription factors to those sites that are functional. In particular, it has been shown that active regulatory regions are usually depleted of nucleosomes, thereby enabling transcription factors to bind DNA in those regions [1]. In this paper, we describe a novel algorithm which employs an informative prior over DNA sequence positions based on a discriminative view of nucleosome occupancy; the nucleosome occupancy information comes from a recently published computational model [2]. When a Gibbs sampling algorithm with our informative prior is applied to yeast sequence-sets identified by ChIP-chip [3], the correct motif is found in 50% more cases than with an uninformative uniform prior. Moreover, if nucleosome occupancy information is not available, our informative prior reduces to a new kind of prior that can exploit discriminative information in a purely generative setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lee, C., Shibata, Y., Rao, B., Strahl, B., Lieb, J.: Evidence for nucleosome depletion at active regulatory regions genome-wide. Nature Genetics 36(8), 900–905 (2004)

    Article  Google Scholar 

  2. Segal, E., Fondufe-Mittendorf, Y., Chen, L., Thastrom, A., Field, Y., Moore, I., Wang, J., Widom, J.: A genomic code for nucleosome positioning. Nature 442(7104), 772–778 (2006)

    Article  Google Scholar 

  3. Harbison, C., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)

    Article  Google Scholar 

  4. Lee, T., et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)

    Article  Google Scholar 

  5. Liu, X., Noll, D., Lieb, J., Clarke, N.: DIP-chip: Rapid and accurate determination of DNA binding specificity. Genome Research 15(3), 421–427 (2005)

    Article  Google Scholar 

  6. Mukherjee, S., Berger, M., Jona, G., Wang, X., Muzzey, D., Snyder, M., Young, R., Bulyk, M.: Rapid analysis of the DNA binding specificities of transcription factors with DNA microarrays. Nature Genetics 36(12), 1331–1339 (2004)

    Article  Google Scholar 

  7. Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9, 3273–3297 (1998)

    Google Scholar 

  8. Kim, S., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J., Eizinger, A., Wylie, B., Davidson, G.: A gene expression map for Caenorhabditis elegans. Science 293, 2087–2092 (2001)

    Article  Google Scholar 

  9. Wasserman, W., Sandelin, A.: Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5(4), 276–287 (2004)

    Article  Google Scholar 

  10. Siggia, E.: Computational methods for transcriptional regulation. Current Opinion in Genetics and Development 15, 214–221 (2005)

    Article  Google Scholar 

  11. Workman, C., Stormo, G.: ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. In: Pac. Symp. Biocomput., pp. 467–478 (2000)

    Google Scholar 

  12. Segal, E., Barash, Y., Simon, I., Friedman, N., Koller, D.: From sequence to expression: A probabilistic framework. In: RECOMB ’02 (2002)

    Google Scholar 

  13. Sinha,S,: Discriminative motifs. In: RECOMB ’02 (2002)

    Google Scholar 

  14. Hong, P., Liu, X., Zhou, Q., Lu, X., Liu, J., Wong, W.: A boosting approach for motif modeling using ChIP-chip data. Bioinformatics 21(11), 2636–2643 (2005)

    Article  Google Scholar 

  15. Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14), e454–463 (2006)

    Article  Google Scholar 

  16. Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)

    Article  MathSciNet  Google Scholar 

  17. Almer, A., Rudolph, H., Hinnen, A., Horz, W.: Removal of positioned nucleosomes from the yeast PHO5 promoter upon PHO5 induction releases additional upstream activating DNA elements. Embo. J. 5, 2689–2696 (1986)

    Google Scholar 

  18. Mai, X., Chou, S., Struhl, K.: Preferential accessibility of the yeast his3 promoter is determined by a general property of the DNA sequence, not by specific elements. Cell Biol. 20, 6668–6676 (2000)

    Google Scholar 

  19. Sekinger, E., Moqtaderi, Z., Struhl, K.: Intrinsic histone-DNA interactions and low nucleosome density are important for preferential accessibility of promoter regions in yeast. Mol. Cell 18, 735–748 (2005)

    Article  Google Scholar 

  20. Yuan, G., Liu, Y., Dion, M., Slack, M., Wu, L., Altschuler, S., Rando, O.: Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309, 626–630 (2005)

    Article  Google Scholar 

  21. Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12, 505–519 (1984)

    Article  Google Scholar 

  22. Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: ISMB ’94, pp. 28–36. AAAI Press, Menlo Park (1994)

    Google Scholar 

  23. Gelfand, A., Smith, A.: Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398–409 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  24. Liu, J.: The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89, 958–966 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  25. Liu, J., Neuwald, A., Lawrence, C.: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association 90, 1156–1170 (1995)

    Article  MATH  Google Scholar 

  26. Narlikar, L., Gordân, R., Ohler, U., Hartemink, A.: Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 22(14), e384–e392 (2006)

    Article  Google Scholar 

  27. Roth, F., Hughes, J., Estep, P., Church, G.: Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-genome mRNA quantitation. Nature Biotech. 16, 939–945 (1998)

    Article  Google Scholar 

  28. Liu, X., Brutlag, D., Liu, J.: BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)

    Google Scholar 

  29. Thijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor, B., Rouze, P., Moreau, Y.: A Gibbs sampling method to detect over-represented motifs in the upstream regions of coexpressed genes. Journal of Computational Biology 9, 447–464 (2002)

    Article  Google Scholar 

  30. Dorrington, R.A., Cooper, T.G.: The DAL82 protein of Saccharomyces cerevisiae binds to the DAL upstream induction sequence (UIS). Nucleic Acids Research 21(16), 3777–3784 (1993)

    Article  Google Scholar 

  31. Jia, Y., Rothermel, B., Thornton, J., Butow, R.A.: A basic helix-loop-helix-leucine zipper transcription complex in yeast functions in a signaling pathway from mitochondria to the nucleus. Molecular and Cellular Biology 17, 1110–1117 (1993)

    Google Scholar 

  32. Liu, X., Brutlag, D., Liu, J.: An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiments. Nature Biotech. 20, 835–839 (2002)

    Google Scholar 

  33. Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 432, 241–254 (2003)

    Article  Google Scholar 

  34. Bulyk, M., Johnson, P., Church, G.: Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Research 30, 1255–1261 (2002)

    Article  Google Scholar 

  35. Agarwal, P., Bafna, V.: Detecting non-adjacent correlations within signals in DNA. In: RECOMB ’98 (1998)

    Google Scholar 

  36. Barash, Y., Elidan, G., Friedman, N., Kaplan, T.: Modeling dependencies in protein-DNA binding sites. In: RECOMB ’03 (2003)

    Google Scholar 

  37. Miller, W., Makova, K., Nekrutenko, A., Hardison, R.: Comparative Genomics. Annu. Rev. Genom. Human. Genet. 5, 15–56 (2004)

    Article  Google Scholar 

  38. Siddharthan, R., Siggia, E., Nimwegen, E.: PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny. PLoS Comput. Biol. 1(7), e67 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Terry Speed Haiyan Huang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Narlikar, L., Gordân, R., Hartemink, A.J. (2007). Nucleosome Occupancy Information Improves de novo Motif Discovery. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71681-5_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71680-8

  • Online ISBN: 978-3-540-71681-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics