Abstract
Identifying approximately repeated patterns, or motifs, in biological sequences from a set of co-regulated genes is an important step towards deciphering the complex gene regulatory networks and understanding gene functions. In this work, we develop a novel motif finding algorithm based on a population-based stochastic optimization technique called Particle Swarm Optimization (PSO), which has been shown to be effective in optimizing difficult multidimensional problems in continuous domains. We propose a modification of the standard PSO algorithm to handle discrete values, such as characters in DNA sequences. Our algorithm also provides several unique features. First, we use both consensus and position-specific weight matrix representations in our algorithm, taking advantage of the efficiency of the former and the accuracy of the later. Furthermore, many real motifs contain gaps, but the existing methods usually ignore them or assume a user know their exact locations and lengths, which is usually impractical for real applications. In comparison, our method models gaps explicitly, and provides an easy solution to find gapped motifs without any detailed knowledge of gaps. Our method also allows some input sequences to contain zero or multiple binding sites. Experimental results on synthetic challenge problems as well as real biological sequences show that our method is both more efficient and more accurate than several existing algorithms, especially when gaps are present in the motifs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, W., Pavesi, G., Pesole, G., Régnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)
Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994)
Roth, F., Hughes, J., Estep, P., Church, G.: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mrna quantitation. Nat. Biotechnol. 16, 939–945 (1998)
Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Liu, X., Brutlag, D., Liu, J.: Bioprospector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)
Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17, S207–S214 (2001)
Sinha, S., Tompa, M.: Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 30, 5549–5560 (2002)
Keich, U., Pevzner, P.: Finding motifs in the twilight zone. Bioinformatics 18, 1374–1381 (2002)
Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9, 225–242 (2002)
Wei, Z., Jensen, S.T.: GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22, 1577–1584 (2006)
Chan, T.M., Leung, K.S., Lee, K.H.: TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24(3), 341–349 (2008)
Lei, C., Ruan, J.: A novel swarm intelligence algorithm for finding DNA motifs. International Journal of Computational Biology and Drug Design 2, 323–339 (2009)
Zhou, W., Zhou, C., Liu, G., Huang, Y.: Identification of transcription factor binding sites using hybrid particle swarm optimization. In: Ślęzak, D., Yao, J., Peters, J.F., Ziarko, W.P., Hu, X. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3642, pp. 438–445. Springer, Heidelberg (2005)
Hardin, C., Rouchka, E.: DNA motif detection using particle swarm optimization and expectation-maximization. In: Proceedings of the 2005 IEEE Swarm Intelligence Symposium (2005)
Eberhart, R., Shi, Y., Kennedy, J.: Swarm Intelligence. Morgan Kaufmann, San Francisco (2001)
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 269–278 (2000)
Sze, S.H., Zhao, X.: Improved pattern-driven algorithms for motif finding in DNA sequences, pp. 198–211 (2006)
Jensen, S.T., Liu, J.S.: Biooptimizer: a bayesian scoring function approach to motif discovery. Bioinformatics 20(10), 1557–1564 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lei, C., Ruan, J. (2010). Finding Gapped Motifs by a Novel Evolutionary Algorithm. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2010. Lecture Notes in Computer Science, vol 6023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12211-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-12211-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12210-1
Online ISBN: 978-3-642-12211-8
eBook Packages: Computer ScienceComputer Science (R0)