Skip to main content

Comparison of Simple Encoding Schemes in GA’s for the Motif Finding Problem: Preliminary Results

  • Conference paper
  • 780 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4643))

Abstract

The DNA motif finding problem is of great relevance in molecular biology. Weak signals that mark transcription factor binding sites involved in gene regulation are considered to be challenging to find. These signals (motifs) consist of a short string of unknown length that can be located anywhere in the gene promoter region. Therefore, the problem consists on discovering short, conserved sites in genomic DNA without knowing, a priori, the length nor the chemical composition of the site, turning the original problem into a combinatorial one, where computational tools can be applied to find the solution. Pevzner and Sze [7], studied a precise combinatorial formulation of this problem, called the planted motif problem, which is of particular interest because it is a challenging model for commonly used motif-finding algorithms [15]. In this work, we analyze two different encoding schemes for genetic algorithms to solve the planted motif finding problem. One representation encodes the initial position for the motif occurrences at each sequence, and the other encodes a candidate motif. We test the performance of both algorithms on a set of planted motif instances. Preliminary experimental results show a promising superior performance of the algorithm encoding the candidate motif over the more standard position based scheme.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)

    Google Scholar 

  2. Blanchette, M., Schwikowski, B., Tompa, M.: Algorithms for philogenetic footprinting. J. Comp. Biol. 9, 211–223 (2002)

    Article  Google Scholar 

  3. Brazma, A., Jonassen, I., Vilo, J., Ukkonen, E.: Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 15, 1202–1215 (1998)

    Google Scholar 

  4. Buhler, J., Martin, T.: Finding Motifs Using Random Projections. Journal of Computational Biology 9(2), 225–242 (2002)

    Article  Google Scholar 

  5. Che, D., Song, Y., Rasheed, K.: MDGA: Motif Discovery Using A Genetic Algorithm. GECCO’05 (June 25-29, 2005)

    Google Scholar 

  6. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)

    MATH  Google Scholar 

  7. Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biolofy. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  8. Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–677 (1999)

    Article  Google Scholar 

  9. Jones, N.C., Pevzner, P.A.: Introduction to Bioinformatics Algorithms. MIT Press, Cambridge (2004)

    Google Scholar 

  10. Karaoglu, N., Maurer-Stroh, S., Manderick, B.: GAMOT: An efficient genetic algorithm for finding challenging motifs in DNA sequences. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, Springer, Heidelberg (2006)

    Google Scholar 

  11. Lawrence, C., Altschul, S., Bogusky, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 208–214 (1993)

    Google Scholar 

  12. Liu, F.M., Tsai, J.P., Chen, R.M., Chen, S.N., Shih, S.H.: FMGA: finding motifs by genetic algorithm. In (BIBE 2004). IEEE Fourth Symposium on Bioinformatics and Bioengineering, pp. 459–466. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  13. Liu, X., Brutlag, D.L., Liu, J.S.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput. 6, 127–138 (2001)

    Google Scholar 

  14. Pevzner, P., Sze, S.-H.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proc. 8th Int. Conf. Intelligent Systems for Molecular Biology, pp. 269–278 (2000)

    Google Scholar 

  15. Price, A., Ramabhadram, S., Pevzner, P.: Finding Subtle Motifs by Branching from Sample Strings. Bioinformatics 1(1), 1–7 (2003)

    Article  Google Scholar 

  16. Roth, F.R., Hughes, J.D., Estep, P.E., Church, G.M., Finding, D.N.A.: Regulatory Motifs within unaligned non-coding sequences clustered by whole-Genome mRNA quantitation. Nature Biotechnology 16(10), 939–945 (1998)

    Article  Google Scholar 

  17. Sagot, M.-F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Sagot, M.-F., Escalier, V., Viari, A., Soldano, H.: Searching for repeated words in a text allowing for mismatches and gaps. In: Baeza-Yates, R., Manber, U. (eds.) Second South American Workshop on String Processing, Viñas del Mar, Chili, pp. 87–100. University of Chili (1995)

    Google Scholar 

  19. Sinha, S., Tompa, M.: A statistical Method for finding transcription factor binding sites. In: Proc. 8th Int. Conf. Intelligent Systems for Molecular Biology, pp. 344–354 (2000)

    Google Scholar 

  20. Stormo, G.D., Hartzell III, G.W: Identifying protein-binding sites from unaligned DNA fragments. PNAS 86, 1183–1187 (1989)

    Article  Google Scholar 

  21. Stavrovskaya, E.D., Mironov, A.A.: Two genetic algorithms for identification of regulatory signals. In: Silico Biology (2003)

    Google Scholar 

  22. Waterman, M.S., Arratia, R., Galas, D.J.: Pattern recognition in several sequences: consensus and alignment. Bull. Math. Biol. 46, 515–527 (1984)

    MATH  Google Scholar 

  23. Zaslavsky, E., Singh, M.: A combinatorial optimization approach for diverse motif finding applications. Algorithms for Molecular Biology, 1–13 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marie-France Sagot Maria Emilia M. T. Walter

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martínez-Arellano, G., Brizuela, C.A. (2007). Comparison of Simple Encoding Schemes in GA’s for the Motif Finding Problem: Preliminary Results. In: Sagot, MF., Walter, M.E.M.T. (eds) Advances in Bioinformatics and Computational Biology. BSB 2007. Lecture Notes in Computer Science(), vol 4643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73731-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73731-5_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73730-8

  • Online ISBN: 978-3-540-73731-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics