Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5755))

Included in the following conference series:

  • 1566 Accesses

Abstract

DNA motif discovery is a much explored problem in functional genomics. This paper describes a table driven greedy algorithm for discovering regulatory motifs in the promoter sequences of co-expressed genes. The proposed algorithm searches both DNA strands for the common patterns or motifs. The inputs to the algorithm are set of promoter sequences, the motif length and minimum Information Content. The algorithm generates subsequences of given length from the shortest input promoter sequence. It stores these subsequences and their reverse complements in a table. Then it searches the remaining sequences for good matches of these subsequences. The Information Content score is used to measure the goodness of the motifs. The algorithm has been tested with synthetic data and real data. The results are found promising. The algorithm could discover meaningful motifs from the muscle specific regulatory sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Helden, J.V., Andre, B., Collado-Vides, J.: Extracting Regulatory Sites from the Upstream Region of Yeast Genes by Computational Analysis of Oligonucleotide Frequencies. J. Mol. Biol. 281(5), 827–842 (1998)

    Article  Google Scholar 

  2. Sinha, S., Tompa, M.: Discovery of Novel Rranscription Factor Binding Sites by Statistical Overrepresentation. Nucleic Acids Res. 30(24), 5549–5560 (2002)

    Article  Google Scholar 

  3. Sagot, M.F.: Spelling Approximate Repeated or Common Motifs Using a Sufix Tree. In: Proceedings of the Third Latin American Symposium on Theoretical Informatics, pp. 374–390. Springer, Heidelberg (1998)

    Google Scholar 

  4. Pavesi, G., Mauri, G., Pesole, G.: An Algorithm for Finding Signals of Unknown Length in DNA Sequences. Bioinformatics 17(suppl. 1), S207–S214 (2001)

    Google Scholar 

  5. Eskin, E., Pevzner, P.A.: Finding Composite Regulatory Patterns in DNA Sequences. Bioinformatics 18(suppl.1), 354–363 (2002)

    Google Scholar 

  6. Pevzner, P., Sze, S.: Combinatorial Approaches to Finding Subtle Signals in DNA Sequences. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 8, pp. 269–278 (2000)

    Google Scholar 

  7. Hertz, G.Z., Hartzell, G.W., Stormo, G.D.: Identification of Consensus Patterns in Unaligned DNA Sequences Known to be Functionally Related. Comput. Appl. Biosci. 6, 81–92 (1990)

    Google Scholar 

  8. Bailey, T.L., Elkan, C.: Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization. Machine Learning 21, 51–80 (1995)

    Google Scholar 

  9. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  10. Roth, F.P., Hughes, J.D., Estep, P.W., Church, G.M.: Finding DNA Regulatory Motifs within Unaligned Noncoding Sequences Clustered by Whole-genome mRNA Quantitation. Nature Biotechnology 16, 939–945 (1998)

    Article  Google Scholar 

  11. Thijs, G., Marchal, K., Moreau, Y.: A Gibbs Sampling Method to Detect Over-represented Motifs in Upstream Regions of Co-expressed Genes. RECOMB 5, 305–312 (2001)

    Article  Google Scholar 

  12. Liu, X., Brutlag, D.L., Liu, J.S.: BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes. In: Proceedings of the Sixth Pacific Symposium on Biocomputing, pp. 127–138 (2001)

    Google Scholar 

  13. Shida, K., Gibbs, S.T.: A Gibbs Sampling Method for Motif Discovery with Enhanced Resistance to Local Optima. BMC Bioinformatics 7, 486 (2006)

    Article  Google Scholar 

  14. Liu, F.F.M., Tsai, J.J.P., Chen, R.M., Chen, S.N., Shih, S.H.: Finding Motifs by Genetic Algorithm. In: Fourth IEEE Symposium on Bioinformatics and Bioengineering, p. 459 (2004)

    Google Scholar 

  15. Michael, A., Andy, M.: Tyrrell, Regulatory Motif Discovery Using a Population Clustering Evolutionary Algorithm. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3), 403–414 (2007)

    Article  Google Scholar 

  16. Liu, D., Xiong, X., Das Gupta, B., Zhang, H.: Motif Discoveries in Unaligned Molecular Sequences Using Self-organizing Neural Network. IEEE Transactions on Neural Networks 17, 919–928 (2006)

    Article  Google Scholar 

  17. McCue, L., Thompson, W., Carmack, C., Ryan, M., Liu, J., Derbyshire, V., Lawrence, C.: Phylogenetic Footprinting of Transcription Factor Binding Sites in Proteobacterial Genomes. Nucleic Acids Res. 29, 774–782 (2001)

    Article  Google Scholar 

  18. Berezikov, E., Guryev, V., Plasterk, R.H.A., Cuppen, E.: CONREAL: Conserved Regulatory Elements Anchored Alignment Algorithm for Identification of Transcription Factor Binding Sites by Phylogenetic Footprinting. Genome Res. 14, 170–178 (2004)

    Article  Google Scholar 

  19. Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., Johnston, M.: Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting. Science 301, 71–76 (2003)

    Article  Google Scholar 

  20. Wang, T., Stormo, G.D.: Combining Phylogenetic Data with Coregulated Genes to Identify Regulatory Motifs. Bioinformatics 19, 2369–2380 (2003)

    Article  Google Scholar 

  21. Sinha, S., Blanchette, M., Tompa, M.: PhyME: A probabilistic Algorithm for Finding Motifs in Sets of Orthologous Sequences. BMC Bioinformatics 5, 170 (2004)

    Article  Google Scholar 

  22. Moses, A., Chiang, D., Eisen, M.: Phylogenetic Motif Detection by Expectation-maximization on Evolutionary Mixtures. In: Proceedings of the Ninth Pacific Symposium on Biocomputing, pp. 324–335 (2004)

    Google Scholar 

  23. Siddharthan, R., Siggia, E.D., Van, N.E.: PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny. PLoS Comput. Biol. 1, 534–556 (2005)

    Article  Google Scholar 

  24. Chandan, K.R., Weng, Y.C., Chiang, H.D.: Refining Motifs by Improving Information Content Scores Using Neighborhood Profile Search. Algorithms for Molecular Biology 1, 23 (2006)

    Article  Google Scholar 

  25. Wasserman, W.W., Fickett, J.W.: Identification of Regulatory Regions Which Confer Muscle-specific Gene Expression. Journal of Molecular Biology 278, 167–181 (1998)

    Article  Google Scholar 

  26. Andrew, D.S., Pavel, S., Zhang, M.Q.: Identifying Tissue-selective Transcription Factor Binding Sites in Vertebrate Promoters. PNAS 102(5), 1560–1565 (2005)

    Article  Google Scholar 

  27. TOMTOM motif comparison tool, http://meme.sdsc.edu/meme4_1/cgi-bin/tomtom.cgi

  28. JASPAR, http://jaspar.cgb.ki.se/

  29. CRP Motif sequences, http://dragon.bio.purdue.edu/pmotif/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Seeja, K.R., Alam, M.A., Jain, S.K. (2009). MotifMiner: A Table Driven Greedy Algorithm for DNA Motif Mining. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds) Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence. ICIC 2009. Lecture Notes in Computer Science(), vol 5755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04020-7_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04020-7_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04019-1

  • Online ISBN: 978-3-642-04020-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics