Comparison of Simple Encoding Schemes in GA’s for the Motif Finding Problem: Preliminary Results

Martínez-Arellano, Giovanna; Brizuela, Carlos A.

doi:10.1007/978-3-540-73731-5_3

Comparison of Simple Encoding Schemes in GA’s for the Motif Finding Problem: Preliminary Results

Giovanna Martínez-Arellano¹ &
Carlos A. Brizuela¹

Conference paper

780 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4643))

Abstract

The DNA motif finding problem is of great relevance in molecular biology. Weak signals that mark transcription factor binding sites involved in gene regulation are considered to be challenging to find. These signals (motifs) consist of a short string of unknown length that can be located anywhere in the gene promoter region. Therefore, the problem consists on discovering short, conserved sites in genomic DNA without knowing, a priori, the length nor the chemical composition of the site, turning the original problem into a combinatorial one, where computational tools can be applied to find the solution. Pevzner and Sze [7], studied a precise combinatorial formulation of this problem, called the planted motif problem, which is of particular interest because it is a challenging model for commonly used motif-finding algorithms [15]. In this work, we analyze two different encoding schemes for genetic algorithms to solve the planted motif finding problem. One representation encodes the initial position for the motif occurrences at each sequence, and the other encodes a candidate motif. We test the performance of both algorithms on a set of planted motif instances. Preliminary experimental results show a promising superior performance of the algorithm encoding the candidate motif over the more standard position based scheme.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)
Google Scholar
Blanchette, M., Schwikowski, B., Tompa, M.: Algorithms for philogenetic footprinting. J. Comp. Biol. 9, 211–223 (2002)
Article Google Scholar
Brazma, A., Jonassen, I., Vilo, J., Ukkonen, E.: Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 15, 1202–1215 (1998)
Google Scholar
Buhler, J., Martin, T.: Finding Motifs Using Random Projections. Journal of Computational Biology 9(2), 225–242 (2002)
Article Google Scholar
Che, D., Song, Y., Rasheed, K.: MDGA: Motif Discovery Using A Genetic Algorithm. GECCO’05 (June 25-29, 2005)
Google Scholar
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)
MATH Google Scholar
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biolofy. Cambridge University Press, Cambridge (1997)
Google Scholar
Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–677 (1999)
Article Google Scholar
Jones, N.C., Pevzner, P.A.: Introduction to Bioinformatics Algorithms. MIT Press, Cambridge (2004)
Google Scholar
Karaoglu, N., Maurer-Stroh, S., Manderick, B.: GAMOT: An efficient genetic algorithm for finding challenging motifs in DNA sequences. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, Springer, Heidelberg (2006)
Google Scholar
Lawrence, C., Altschul, S., Bogusky, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 208–214 (1993)
Google Scholar
Liu, F.M., Tsai, J.P., Chen, R.M., Chen, S.N., Shih, S.H.: FMGA: finding motifs by genetic algorithm. In (BIBE 2004). IEEE Fourth Symposium on Bioinformatics and Bioengineering, pp. 459–466. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Liu, X., Brutlag, D.L., Liu, J.S.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput. 6, 127–138 (2001)
Google Scholar
Pevzner, P., Sze, S.-H.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proc. 8th Int. Conf. Intelligent Systems for Molecular Biology, pp. 269–278 (2000)
Google Scholar
Price, A., Ramabhadram, S., Pevzner, P.: Finding Subtle Motifs by Branching from Sample Strings. Bioinformatics 1(1), 1–7 (2003)
Article Google Scholar
Roth, F.R., Hughes, J.D., Estep, P.E., Church, G.M., Finding, D.N.A.: Regulatory Motifs within unaligned non-coding sequences clustered by whole-Genome mRNA quantitation. Nature Biotechnology 16(10), 939–945 (1998)
Article Google Scholar
Sagot, M.-F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)
Chapter Google Scholar
Sagot, M.-F., Escalier, V., Viari, A., Soldano, H.: Searching for repeated words in a text allowing for mismatches and gaps. In: Baeza-Yates, R., Manber, U. (eds.) Second South American Workshop on String Processing, Viñas del Mar, Chili, pp. 87–100. University of Chili (1995)
Google Scholar
Sinha, S., Tompa, M.: A statistical Method for finding transcription factor binding sites. In: Proc. 8th Int. Conf. Intelligent Systems for Molecular Biology, pp. 344–354 (2000)
Google Scholar
Stormo, G.D., Hartzell III, G.W: Identifying protein-binding sites from unaligned DNA fragments. PNAS 86, 1183–1187 (1989)
Article Google Scholar
Stavrovskaya, E.D., Mironov, A.A.: Two genetic algorithms for identification of regulatory signals. In: Silico Biology (2003)
Google Scholar
Waterman, M.S., Arratia, R., Galas, D.J.: Pattern recognition in several sequences: consensus and alignment. Bull. Math. Biol. 46, 515–527 (1984)
MATH Google Scholar
Zaslavsky, E., Singh, M.: A combinatorial optimization approach for diverse motif finding applications. Algorithms for Molecular Biology, 1–13 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Sciences Department, CICESE Research Center, Km 107 Carr. Tijuana-Ensenada, Ensenada, B.C., México
Giovanna Martínez-Arellano & Carlos A. Brizuela

Authors

Giovanna Martínez-Arellano
View author publications
You can also search for this author in PubMed Google Scholar
Carlos A. Brizuela
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Marie-France Sagot Maria Emilia M. T. Walter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martínez-Arellano, G., Brizuela, C.A. (2007). Comparison of Simple Encoding Schemes in GA’s for the Motif Finding Problem: Preliminary Results. In: Sagot, MF., Walter, M.E.M.T. (eds) Advances in Bioinformatics and Computational Biology. BSB 2007. Lecture Notes in Computer Science(), vol 4643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73731-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-73731-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73730-8
Online ISBN: 978-3-540-73731-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics