Skip to main content

Finding Gapped Motifs by a Novel Evolutionary Algorithm

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6023))

Abstract

Identifying approximately repeated patterns, or motifs, in biological sequences from a set of co-regulated genes is an important step towards deciphering the complex gene regulatory networks and understanding gene functions. In this work, we develop a novel motif finding algorithm based on a population-based stochastic optimization technique called Particle Swarm Optimization (PSO), which has been shown to be effective in optimizing difficult multidimensional problems in continuous domains. We propose a modification of the standard PSO algorithm to handle discrete values, such as characters in DNA sequences. Our algorithm also provides several unique features. First, we use both consensus and position-specific weight matrix representations in our algorithm, taking advantage of the efficiency of the former and the accuracy of the later. Furthermore, many real motifs contain gaps, but the existing methods usually ignore them or assume a user know their exact locations and lengths, which is usually impractical for real applications. In comparison, our method models gaps explicitly, and provides an easy solution to find gapped motifs without any detailed knowledge of gaps. Our method also allows some input sequences to contain zero or multiple binding sites. Experimental results on synthetic challenge problems as well as real biological sequences show that our method is both more efficient and more accurate than several existing algorithms, especially when gaps are present in the motifs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, W., Pavesi, G., Pesole, G., Régnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)

    Article  Google Scholar 

  2. Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994)

    Google Scholar 

  3. Roth, F., Hughes, J., Estep, P., Church, G.: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mrna quantitation. Nat. Biotechnol. 16, 939–945 (1998)

    Article  Google Scholar 

  4. Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  5. Liu, X., Brutlag, D., Liu, J.: Bioprospector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)

    Google Scholar 

  6. Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17, S207–S214 (2001)

    Google Scholar 

  7. Sinha, S., Tompa, M.: Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 30, 5549–5560 (2002)

    Article  Google Scholar 

  8. Keich, U., Pevzner, P.: Finding motifs in the twilight zone. Bioinformatics 18, 1374–1381 (2002)

    Article  Google Scholar 

  9. Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9, 225–242 (2002)

    Article  Google Scholar 

  10. Wei, Z., Jensen, S.T.: GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22, 1577–1584 (2006)

    Article  Google Scholar 

  11. Chan, T.M., Leung, K.S., Lee, K.H.: TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24(3), 341–349 (2008)

    Article  Google Scholar 

  12. Lei, C., Ruan, J.: A novel swarm intelligence algorithm for finding DNA motifs. International Journal of Computational Biology and Drug Design 2, 323–339 (2009)

    Article  Google Scholar 

  13. Zhou, W., Zhou, C., Liu, G., Huang, Y.: Identification of transcription factor binding sites using hybrid particle swarm optimization. In: Ślęzak, D., Yao, J., Peters, J.F., Ziarko, W.P., Hu, X. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3642, pp. 438–445. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Hardin, C., Rouchka, E.: DNA motif detection using particle swarm optimization and expectation-maximization. In: Proceedings of the 2005 IEEE Swarm Intelligence Symposium (2005)

    Google Scholar 

  15. Eberhart, R., Shi, Y., Kennedy, J.: Swarm Intelligence. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  16. Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 269–278 (2000)

    Google Scholar 

  17. Sze, S.H., Zhao, X.: Improved pattern-driven algorithms for motif finding in DNA sequences, pp. 198–211 (2006)

    Google Scholar 

  18. Jensen, S.T., Liu, J.S.: Biooptimizer: a bayesian scoring function approach to motif discovery. Bioinformatics 20(10), 1557–1564 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lei, C., Ruan, J. (2010). Finding Gapped Motifs by a Novel Evolutionary Algorithm. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2010. Lecture Notes in Computer Science, vol 6023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12211-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12211-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12210-1

  • Online ISBN: 978-3-642-12211-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics