Abstract
DNA motif discovery is a much explored problem in functional genomics. This paper describes a table driven greedy algorithm for discovering regulatory motifs in the promoter sequences of co-expressed genes. The proposed algorithm searches both DNA strands for the common patterns or motifs. The inputs to the algorithm are set of promoter sequences, the motif length and minimum Information Content. The algorithm generates subsequences of given length from the shortest input promoter sequence. It stores these subsequences and their reverse complements in a table. Then it searches the remaining sequences for good matches of these subsequences. The Information Content score is used to measure the goodness of the motifs. The algorithm has been tested with synthetic data and real data. The results are found promising. The algorithm could discover meaningful motifs from the muscle specific regulatory sequences.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Helden, J.V., Andre, B., Collado-Vides, J.: Extracting Regulatory Sites from the Upstream Region of Yeast Genes by Computational Analysis of Oligonucleotide Frequencies. J. Mol. Biol. 281(5), 827–842 (1998)
Sinha, S., Tompa, M.: Discovery of Novel Rranscription Factor Binding Sites by Statistical Overrepresentation. Nucleic Acids Res. 30(24), 5549–5560 (2002)
Sagot, M.F.: Spelling Approximate Repeated or Common Motifs Using a Sufix Tree. In: Proceedings of the Third Latin American Symposium on Theoretical Informatics, pp. 374–390. Springer, Heidelberg (1998)
Pavesi, G., Mauri, G., Pesole, G.: An Algorithm for Finding Signals of Unknown Length in DNA Sequences. Bioinformatics 17(suppl. 1), S207–S214 (2001)
Eskin, E., Pevzner, P.A.: Finding Composite Regulatory Patterns in DNA Sequences. Bioinformatics 18(suppl.1), 354–363 (2002)
Pevzner, P., Sze, S.: Combinatorial Approaches to Finding Subtle Signals in DNA Sequences. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 8, pp. 269–278 (2000)
Hertz, G.Z., Hartzell, G.W., Stormo, G.D.: Identification of Consensus Patterns in Unaligned DNA Sequences Known to be Functionally Related. Comput. Appl. Biosci. 6, 81–92 (1990)
Bailey, T.L., Elkan, C.: Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization. Machine Learning 21, 51–80 (1995)
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment. Science 262, 208–214 (1993)
Roth, F.P., Hughes, J.D., Estep, P.W., Church, G.M.: Finding DNA Regulatory Motifs within Unaligned Noncoding Sequences Clustered by Whole-genome mRNA Quantitation. Nature Biotechnology 16, 939–945 (1998)
Thijs, G., Marchal, K., Moreau, Y.: A Gibbs Sampling Method to Detect Over-represented Motifs in Upstream Regions of Co-expressed Genes. RECOMB 5, 305–312 (2001)
Liu, X., Brutlag, D.L., Liu, J.S.: BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes. In: Proceedings of the Sixth Pacific Symposium on Biocomputing, pp. 127–138 (2001)
Shida, K., Gibbs, S.T.: A Gibbs Sampling Method for Motif Discovery with Enhanced Resistance to Local Optima. BMC Bioinformatics 7, 486 (2006)
Liu, F.F.M., Tsai, J.J.P., Chen, R.M., Chen, S.N., Shih, S.H.: Finding Motifs by Genetic Algorithm. In: Fourth IEEE Symposium on Bioinformatics and Bioengineering, p. 459 (2004)
Michael, A., Andy, M.: Tyrrell, Regulatory Motif Discovery Using a Population Clustering Evolutionary Algorithm. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3), 403–414 (2007)
Liu, D., Xiong, X., Das Gupta, B., Zhang, H.: Motif Discoveries in Unaligned Molecular Sequences Using Self-organizing Neural Network. IEEE Transactions on Neural Networks 17, 919–928 (2006)
McCue, L., Thompson, W., Carmack, C., Ryan, M., Liu, J., Derbyshire, V., Lawrence, C.: Phylogenetic Footprinting of Transcription Factor Binding Sites in Proteobacterial Genomes. Nucleic Acids Res. 29, 774–782 (2001)
Berezikov, E., Guryev, V., Plasterk, R.H.A., Cuppen, E.: CONREAL: Conserved Regulatory Elements Anchored Alignment Algorithm for Identification of Transcription Factor Binding Sites by Phylogenetic Footprinting. Genome Res. 14, 170–178 (2004)
Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., Johnston, M.: Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting. Science 301, 71–76 (2003)
Wang, T., Stormo, G.D.: Combining Phylogenetic Data with Coregulated Genes to Identify Regulatory Motifs. Bioinformatics 19, 2369–2380 (2003)
Sinha, S., Blanchette, M., Tompa, M.: PhyME: A probabilistic Algorithm for Finding Motifs in Sets of Orthologous Sequences. BMC Bioinformatics 5, 170 (2004)
Moses, A., Chiang, D., Eisen, M.: Phylogenetic Motif Detection by Expectation-maximization on Evolutionary Mixtures. In: Proceedings of the Ninth Pacific Symposium on Biocomputing, pp. 324–335 (2004)
Siddharthan, R., Siggia, E.D., Van, N.E.: PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny. PLoS Comput. Biol. 1, 534–556 (2005)
Chandan, K.R., Weng, Y.C., Chiang, H.D.: Refining Motifs by Improving Information Content Scores Using Neighborhood Profile Search. Algorithms for Molecular Biology 1, 23 (2006)
Wasserman, W.W., Fickett, J.W.: Identification of Regulatory Regions Which Confer Muscle-specific Gene Expression. Journal of Molecular Biology 278, 167–181 (1998)
Andrew, D.S., Pavel, S., Zhang, M.Q.: Identifying Tissue-selective Transcription Factor Binding Sites in Vertebrate Promoters. PNAS 102(5), 1560–1565 (2005)
TOMTOM motif comparison tool, http://meme.sdsc.edu/meme4_1/cgi-bin/tomtom.cgi
JASPAR, http://jaspar.cgb.ki.se/
CRP Motif sequences, http://dragon.bio.purdue.edu/pmotif/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Seeja, K.R., Alam, M.A., Jain, S.K. (2009). MotifMiner: A Table Driven Greedy Algorithm for DNA Motif Mining. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds) Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence. ICIC 2009. Lecture Notes in Computer Science(), vol 5755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04020-7_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-04020-7_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04019-1
Online ISBN: 978-3-642-04020-7
eBook Packages: Computer ScienceComputer Science (R0)