Abstract
It has been a challenge to discover transcription factor (TF) binding motifs (TFBMs), which are short cis-regulatory DNA sequences playing essential roles in transcriptional regulation. We approach the problem of discovering TFBMs from a steganographic perspective. We view the regulatory regions of a genome as if they constituted a stegoscript with conserved words (i.e., TFBMs) being embedded in a covertext, and model the stegoscript with a statistical model consisting of a dictionary and a grammar. We develop an efficient algorithm, WordSpy, to learn such a model from a stegoscript and to recover conserved motifs. Subsequently, we select biologically meaningful motifs based on a motif’s specificity to the set of genes of interest and/or the expression coherence of the genes whose promoters contain the motif. From the promoters of 645 distinct cell-cycle related genes of S. cerevisiae, our method is able to identify all known cell-cycle related TFBMs among its top ranking motifs. Our method can also be directly applied to discriminative motif finding. By utilizing the ChIP-chip data of Lee et al., we predicted potential binding motifs of 113 known transcription factors of budding yeast.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lemon, B., Tjian, R.: Orchestrated response: A symphony of transcription factors for gene control. Genes Dev. 14(20), 2551–2569 (2000)
Lawrence, C.E., Altschul, S.F., Bogouski, M.S., Liu, J.S., Neuwald, A.F., Wooten, J.C.: Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using EM. Machine Learning 21(1-2), 51–80 (1995)
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7-8), 563–577 (1999)
Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Molecular Biology 296(5), 1205–1214 (2000)
van Helden, J., Andre, B., Collado-Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Molecular Biology 281(5), 827–842 (1998)
Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. In: 8th Intern. Conf. on Intelligent Systems for Molecular Biology (2000)
Zhang, M.Q.: Large scale gene expression data analysis: A new challenge to computational biologists. Genome Research 9(8), 681–688 (1999)
Segal, E., Yelensky, R., Koller, D.: Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19, 273–282 (2003)
Tamada, Y., et al.: Estimating gene networks from gene expression data by combining bayesian network model with promoter element detection. Bioinformatics 19, 227–236 (2003)
Wayner, P.: Disappearing Cryptography, 2nd edn. Morgan Kaufmann, San Francisco (2002)
Bussemaker, H.J., Li, H., Siggia, E.D.: Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis. Proc. Natl. Acad. Sci. USA. 97(18), 10096–10100 (2002)
Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2001)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Regnier, M.: unified approach to word statistics. In: RECOMB, pp. 207–213 (1998)
Reinert, G., Schbath, S., Waterman, M.S.: Probabilistic and statistical properties of words: An overview. J. Computational Biology 7(1-2), 1–46 (2000)
Pilpel, Y., Sudarsanam, P., Church, G.M.: Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genetics 29(2), 153–159 (2001)
Spellman, P.T., et al.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9, 3273–3297 (1998)
Dohrmann, P., Voth, W., Stillman, D.: Role of negative regulation in promoter specificity of the homologous transcriptional activators ace2p and swi5p. Mol. Cell Biol. 16(4), 1746–1758 (1996)
Zhu, J., Zhang, M.Q.: SCPD: A Promoter Database of Yeast Saccharomyces cerevisiae. Bioinformatics 15, 607–611 (1999)
Kato, M., Hata, N., Banerjee, N., Futcher, B., Zhang, M.Q.: Identifying combinatorial regulation of transcription factors and binding motifs. Genome Biology 5, R56 (2004)
Dolan, J.W., Kirkman, C., Fields, S.: The yeast STE12 protein binds to the DNA sequence mediating pheromone induction. Proc. Natl. Acad. Sci. USA 86(15), 5703–5707 (1989)
Blaiseau, P.L., Thomas, D.: Multiple transcriptional activation complexes tether the yeast activator Met4 to DNA. EMBO J. 17, 6327–6336 (1998)
van Helden, J., Andre, B., Collado-Vides, J.: A web site for the computational analysis of yeast regulatory sequences. Yeast 16(2), 177–187 (2000)
Stuart, J.M., Segal, E., Koller, D., Kim, S.K.: A gene coexpression network for global discovery of conserved genetic modules. Science 302(5643), 249–255 (2003)
Koch, C., Moll, T., Neuberg, M., Ahorn, H., Nasmyth, K.: A role for the transcription factors Mbp1 and Swi4 in progression from G1 to S phas. Science 261, 1551–1557 (1993)
Hollenhorst, P.C., Bose, M.E., Mielke, M.R., Müller, U., Fox, C.A.: Forkhead genes in transcriptional silencing, cell morphology and the cell cycle: Overlapping and distinct functions for FKH1 and FKH2 in Saccharomyces cerevisiae. Genetics 154, 1533–1548 (2000)
Lee, T.I., et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)
Gupta, M., Liu, J.: Discovery of conserved sequence patterns using a stochastic dictionary model. J. Amer. Statist. Assoc. 98, 55–66 (2003)
Sinha, S., Nimwegen, E.V., Siggia, E.D.: A probabilistic method to detect regulatory modules. Bioinformatics 19, 292–301 (2003)
Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.S.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423(6937), 241–254 (2003)
Wasserman, W.W., Palumbo, M., Thompson, W., Fickett, J.W., Lawrence, C.E.: Human-mouse genome comparisons to locate regulatory sites. Nature Genetics 26(2), 225–228 (2000)
Siggia, E.D.: Computational methods for transcriptional regulation. Cur. Opin. Gene. and Deve. 15, 214–221 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Wang, G., Zhang, W. (2007). Build a Dictionary, Learn a Grammar, Decipher Stegoscripts, and Discover Genomic Regulatory Elements. In: Eskin, E., Ideker, T., Raphael, B., Workman, C. (eds) Systems Biology and Regulatory Genomics. RSB RRG 2005 2005. Lecture Notes in Computer Science(), vol 4023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48540-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-48540-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48293-2
Online ISBN: 978-3-540-48540-7
eBook Packages: Computer ScienceComputer Science (R0)