Abstract
DNA motifs are short recurring patterns which are assumed to have some biological function. Most of the algorithms that solve this problem are computationally prohibitive. In this paper we extend a recent work that discovered identical string motifs. In the first phase of our three phase algorithm we report all the string motifs of all sizes. In the next phase we filter out those motifs which fail to meet our constraints, and in the last phase the motifs are ranked using a combination of stochastic techniques and p-value. Our method outperforms other motif discovery algorithms including some well-known ones such as MEME and Weeder on benchmark data suites.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Azmi, A.M., Al-Ssulami, A.: A linear algorithm to discover exact string motifs. PLoS ONE 9(5), e95148 (2014)
Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learning 21, 51–80 (1995)
Boeva, V., Clement, J., Regnier, M., Roytberg, M.A., Makeev, V.J.: Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules. Algo. Mol. Biol. 2, 13 (2007)
Burset, M., Gulg, R.: Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996)
Buhler, J., Tompa, M.: Finding motifs using random projections. In: Proc. 5th Annual Int. Conf. on Comput. Biol. (RECOMB 2001), Montreal, Canada, pp. 69–76 (2001)
Chin, F., Leung, H.: An efficient algorithm for string motif discovery. In: Proc. 4th Asia-Pacific Bioinfor. Conf (APBC 2006), Taipei, Taiwan, pp. 79–88 (2006)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press (2001)
Fauteux, F., Blanchette, M., Strmvik, M.V.: Seeder: discriminative seeding DNA motif discovery. Bioinfor. 24, 2303–2307 (2008)
GuhaThakurta, D.: Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res. 34, 3585–3598 (2006)
Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33, 4899–4913 (2006)
Karci, A.: Efficient automatic exact motif discovery algorithms for biological sequences. Expert Sys. With App. 36, 7952–7963 (2009)
Kaya, M.: MOGAMOD. Multi-objective genetic algorithm for motif discovery. Expert Sys. With App. 36, 1039–1047 (2009)
Marschall, T., Rahmann, S.: Efficient exact motif discovery. Bioinfor. 29, i356–i364 (2009)
Pavesi, G., Mereghetti, P., Mauri, G., Pesole, G.: Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32, W199–W203 (2004)
Pevzner, P.A., Sze, S.H.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proc. Int. Conf. Intel. Sys. Mol. Biol., vol. 8, pp. 269–278 (2000)
Sandve, G.K., Abul, O., Walseng, V., Drabls, F.: Improved benchmarks for computational motif discovery. BMC Bioinfor. 8, 163 (2007)
Sze, S.H., Zhao, X.: Improved Pattern-driven Algorithms for Motif Finding in DNA Sequences. In: Eskin, E., Ideker, T., Raphael, B., Workman, C. (eds.) RECOMB 2005. LNCS (LNBI), vol. 4023, pp. 198–211. Springer, Heidelberg (2007)
Tompa, M., Li, N., Bailey, T.L., Church, G.M., Moor, B.D., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotech. 23, 137–144 (2005)
Wingender, E., Dietze, P., Karas, H., Knuppel, R.: TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996)
Yu, Q., Huo, H., Vitter, J.S., Huan, J., Nekrich, Y.: StemFinder: An efficient algorithm for searching large motif stems over large alphabets. In: Proc. IEEE Int. Conf. Bioinfor. and Biomed. (BIBM), Shanghai, China, pp. 473–476 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Al-Ssulami, A.M., Azmi, A.M. (2015). Towards a More Efficient Discovery of Biologically Significant DNA Motifs. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9043. Springer, Cham. https://doi.org/10.1007/978-3-319-16483-0_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-16483-0_37
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16482-3
Online ISBN: 978-3-319-16483-0
eBook Packages: Computer ScienceComputer Science (R0)