Abstract
Recognition of transcription factor binding sites (TFBSs or DNA motifs) to help with understanding the regulation of gene expression is one of the major challenges in the post-genomics era. Computational approaches have been developed to perform binding sites discovery based on brute-force search techniques or heuristic search algorithms, and numbers of them have achieved some degrees of success. However, the prediction accuracy of the algorithm can be relatively influenced by the natural low signal-to-noise ratio of the DNA sequence. In this paper, a novel DNA motif discovery approach using a genetic algorithm is proposed to explore the ways to improve the algorithm performance. We take account of the publicly available motif models such as Position Frequency Matrix (PFM) to initialize the population. By considering both conservation and complexity of the DNA motifs, a novel fitness function is developed to better evaluate the motif models during the evolution process. A final model refinement process is also introduced for optimizing the motif models. The experimental results demonstrate a comparable (superior) performance of our approach to recently proposed two genetic algorithm motif discovery approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)
Chan, T.-M., Leung, K.-S., Lee, K.-H.: TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24, 341–349 (2008)
Congdon, C.B., Aman, J.C., Nava, G.M., Gaskins, H.R., Mattingly, C.J.: An Evaluation of Information Content as a Metric for the Inference of Putative Conserved Noncoding Regions in DNA Sequences Using a Genetic Algorithms Approach. IEEE/ACM Trans. on Computational Biology and Bioinformatics 5, 1–14 (2008)
Galas, D.J., Schmitz, A.: DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 5, 3157–3170 (1978)
Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33, 4899–4913 (2005)
Huerta, A.M., Salgado, H., Thieffry, D., Collado-Vides, J.: RegulonDB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res. 26, 55–59 (1998)
Li, L.P., Liang, Y., Bass, R.L.L.: GAPWM: A Genetic Algorithm Method for Optimizing a Position Weight Matrix. Bioinformatics 23, 1188–1194 (2007)
Liu, X.S., Brutlag, D.L., Liu, J.S.: An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiments. Nature Biotechnology 20, 835–839 (2002)
Mahony, S., Hendrix, D., Golden, A., Smith, T.J., Rokhsar, D.S.: Transcription factor binding site identification using the Self-Organizing Map. Bioinformatics 21, 1807–1814 (2005)
Shaw Jr., W.M., Burgin, R., Howell, P.: Performance standards and evaluations in ir test collections: cluster-based retrieval models. Information Processing & Management 33, 1–14 (1997)
Stormo, G.D., Fields, D.S.: Specificity, free energy and information content in protein-DNA interactions. Trends in Biochemical Sciences 23, 109–113 (1998)
Tompa, M., Li, N., Bailey, T.L., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)
van Helden, J., André, B., Collado-Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology 281, 827–842 (1998)
Wei, Z., Jensen, S.T.: GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22, 1577–1584 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, X., Wang, D. (2009). An Improved Genetic Algorithm for DNA Motif Discovery with Public Domain Information. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02490-0_64
Download citation
DOI: https://doi.org/10.1007/978-3-642-02490-0_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02489-4
Online ISBN: 978-3-642-02490-0
eBook Packages: Computer ScienceComputer Science (R0)