Abstract
A complete understanding of transcriptional regulatory processes in the cell requires identification of transcription factor binding sites on a genome-wide scale. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known transcription factor binding sites occur in the genome than are actually functional. Chromatin structure is known to play an important role in guiding transcription factors to those sites that are functional. In particular, it has been shown that active regulatory regions are usually depleted of nucleosomes, thereby enabling transcription factors to bind DNA in those regions [1]. In this paper, we describe a novel algorithm which employs an informative prior over DNA sequence positions based on a discriminative view of nucleosome occupancy; the nucleosome occupancy information comes from a recently published computational model [2]. When a Gibbs sampling algorithm with our informative prior is applied to yeast sequence-sets identified by ChIP-chip [3], the correct motif is found in 50% more cases than with an uninformative uniform prior. Moreover, if nucleosome occupancy information is not available, our informative prior reduces to a new kind of prior that can exploit discriminative information in a purely generative setting.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lee, C., Shibata, Y., Rao, B., Strahl, B., Lieb, J.: Evidence for nucleosome depletion at active regulatory regions genome-wide. Nature Genetics 36(8), 900–905 (2004)
Segal, E., Fondufe-Mittendorf, Y., Chen, L., Thastrom, A., Field, Y., Moore, I., Wang, J., Widom, J.: A genomic code for nucleosome positioning. Nature 442(7104), 772–778 (2006)
Harbison, C., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)
Lee, T., et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)
Liu, X., Noll, D., Lieb, J., Clarke, N.: DIP-chip: Rapid and accurate determination of DNA binding specificity. Genome Research 15(3), 421–427 (2005)
Mukherjee, S., Berger, M., Jona, G., Wang, X., Muzzey, D., Snyder, M., Young, R., Bulyk, M.: Rapid analysis of the DNA binding specificities of transcription factors with DNA microarrays. Nature Genetics 36(12), 1331–1339 (2004)
Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9, 3273–3297 (1998)
Kim, S., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J., Eizinger, A., Wylie, B., Davidson, G.: A gene expression map for Caenorhabditis elegans. Science 293, 2087–2092 (2001)
Wasserman, W., Sandelin, A.: Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5(4), 276–287 (2004)
Siggia, E.: Computational methods for transcriptional regulation. Current Opinion in Genetics and Development 15, 214–221 (2005)
Workman, C., Stormo, G.: ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. In: Pac. Symp. Biocomput., pp. 467–478 (2000)
Segal, E., Barash, Y., Simon, I., Friedman, N., Koller, D.: From sequence to expression: A probabilistic framework. In: RECOMB ’02 (2002)
Sinha,S,: Discriminative motifs. In: RECOMB ’02 (2002)
Hong, P., Liu, X., Zhou, Q., Lu, X., Liu, J., Wong, W.: A boosting approach for motif modeling using ChIP-chip data. Bioinformatics 21(11), 2636–2643 (2005)
Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14), e454–463 (2006)
Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)
Almer, A., Rudolph, H., Hinnen, A., Horz, W.: Removal of positioned nucleosomes from the yeast PHO5 promoter upon PHO5 induction releases additional upstream activating DNA elements. Embo. J. 5, 2689–2696 (1986)
Mai, X., Chou, S., Struhl, K.: Preferential accessibility of the yeast his3 promoter is determined by a general property of the DNA sequence, not by specific elements. Cell Biol. 20, 6668–6676 (2000)
Sekinger, E., Moqtaderi, Z., Struhl, K.: Intrinsic histone-DNA interactions and low nucleosome density are important for preferential accessibility of promoter regions in yeast. Mol. Cell 18, 735–748 (2005)
Yuan, G., Liu, Y., Dion, M., Slack, M., Wu, L., Altschuler, S., Rando, O.: Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309, 626–630 (2005)
Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12, 505–519 (1984)
Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: ISMB ’94, pp. 28–36. AAAI Press, Menlo Park (1994)
Gelfand, A., Smith, A.: Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398–409 (1990)
Liu, J.: The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89, 958–966 (1994)
Liu, J., Neuwald, A., Lawrence, C.: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association 90, 1156–1170 (1995)
Narlikar, L., Gordân, R., Ohler, U., Hartemink, A.: Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 22(14), e384–e392 (2006)
Roth, F., Hughes, J., Estep, P., Church, G.: Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-genome mRNA quantitation. Nature Biotech. 16, 939–945 (1998)
Liu, X., Brutlag, D., Liu, J.: BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)
Thijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor, B., Rouze, P., Moreau, Y.: A Gibbs sampling method to detect over-represented motifs in the upstream regions of coexpressed genes. Journal of Computational Biology 9, 447–464 (2002)
Dorrington, R.A., Cooper, T.G.: The DAL82 protein of Saccharomyces cerevisiae binds to the DAL upstream induction sequence (UIS). Nucleic Acids Research 21(16), 3777–3784 (1993)
Jia, Y., Rothermel, B., Thornton, J., Butow, R.A.: A basic helix-loop-helix-leucine zipper transcription complex in yeast functions in a signaling pathway from mitochondria to the nucleus. Molecular and Cellular Biology 17, 1110–1117 (1993)
Liu, X., Brutlag, D., Liu, J.: An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiments. Nature Biotech. 20, 835–839 (2002)
Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 432, 241–254 (2003)
Bulyk, M., Johnson, P., Church, G.: Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Research 30, 1255–1261 (2002)
Agarwal, P., Bafna, V.: Detecting non-adjacent correlations within signals in DNA. In: RECOMB ’98 (1998)
Barash, Y., Elidan, G., Friedman, N., Kaplan, T.: Modeling dependencies in protein-DNA binding sites. In: RECOMB ’03 (2003)
Miller, W., Makova, K., Nekrutenko, A., Hardison, R.: Comparative Genomics. Annu. Rev. Genom. Human. Genet. 5, 15–56 (2004)
Siddharthan, R., Siggia, E., Nimwegen, E.: PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny. PLoS Comput. Biol. 1(7), e67 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Narlikar, L., Gordân, R., Hartemink, A.J. (2007). Nucleosome Occupancy Information Improves de novo Motif Discovery. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-71681-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71680-8
Online ISBN: 978-3-540-71681-5
eBook Packages: Computer ScienceComputer Science (R0)