An Improved Genetic Algorithm for DNA Motif Discovery with Public Domain Information

Li, Xi; Wang, Dianhui

doi:10.1007/978-3-642-02490-0_64

Xi Li^19,20 &
Dianhui Wang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5506))

Included in the following conference series:

International Conference on Neural Information Processing

1617 Accesses
2 Citations

Abstract

Recognition of transcription factor binding sites (TFBSs or DNA motifs) to help with understanding the regulation of gene expression is one of the major challenges in the post-genomics era. Computational approaches have been developed to perform binding sites discovery based on brute-force search techniques or heuristic search algorithms, and numbers of them have achieved some degrees of success. However, the prediction accuracy of the algorithm can be relatively influenced by the natural low signal-to-noise ratio of the DNA sequence. In this paper, a novel DNA motif discovery approach using a genetic algorithm is proposed to explore the ways to improve the algorithm performance. We take account of the publicly available motif models such as Position Frequency Matrix (PFM) to initialize the population. By considering both conservation and complexity of the DNA motifs, a novel fitness function is developed to better evaluate the motif models during the evolution process. A final model refinement process is also introduced for optimizing the motif models. The experimental results demonstrate a comparable (superior) performance of our approach to recently proposed two genetic algorithm motif discovery approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)
Google Scholar
Chan, T.-M., Leung, K.-S., Lee, K.-H.: TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24, 341–349 (2008)
Article Google Scholar
Congdon, C.B., Aman, J.C., Nava, G.M., Gaskins, H.R., Mattingly, C.J.: An Evaluation of Information Content as a Metric for the Inference of Putative Conserved Noncoding Regions in DNA Sequences Using a Genetic Algorithms Approach. IEEE/ACM Trans. on Computational Biology and Bioinformatics 5, 1–14 (2008)
Article Google Scholar
Galas, D.J., Schmitz, A.: DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 5, 3157–3170 (1978)
Article Google Scholar
Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33, 4899–4913 (2005)
Article Google Scholar
Huerta, A.M., Salgado, H., Thieffry, D., Collado-Vides, J.: RegulonDB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res. 26, 55–59 (1998)
Article Google Scholar
Li, L.P., Liang, Y., Bass, R.L.L.: GAPWM: A Genetic Algorithm Method for Optimizing a Position Weight Matrix. Bioinformatics 23, 1188–1194 (2007)
Article Google Scholar
Liu, X.S., Brutlag, D.L., Liu, J.S.: An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiments. Nature Biotechnology 20, 835–839 (2002)
Article Google Scholar
Mahony, S., Hendrix, D., Golden, A., Smith, T.J., Rokhsar, D.S.: Transcription factor binding site identification using the Self-Organizing Map. Bioinformatics 21, 1807–1814 (2005)
Article Google Scholar
Shaw Jr., W.M., Burgin, R., Howell, P.: Performance standards and evaluations in ir test collections: cluster-based retrieval models. Information Processing & Management 33, 1–14 (1997)
Article Google Scholar
Stormo, G.D., Fields, D.S.: Specificity, free energy and information content in protein-DNA interactions. Trends in Biochemical Sciences 23, 109–113 (1998)
Article Google Scholar
Tompa, M., Li, N., Bailey, T.L., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)
Article Google Scholar
van Helden, J., André, B., Collado-Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology 281, 827–842 (1998)
Article Google Scholar
Wei, Z., Jensen, S.T.: GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22, 1577–1584 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, VIC, 3086, Australia
Xi Li & Dianhui Wang
Department of Primary Industries, Bioscience Research Division, Victorian AgriBiosciences Centre, 1 Park Drive, La Trobe Research and Development Park, Bundoora, VIC, 3083, Australia
Xi Li

Authors

Xi Li
View author publications
You can also search for this author in PubMed Google Scholar
Dianhui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Network Design and Research Center, 680-4 Fukuoka, 820-8502, Kawazu, Iizuka, Japan
Mario Köppen
Knowledge Engineering and Discovery Research Institute (KEDRI), School of Computing and Mathematical Sciences, Auckland University of Technology, 350 Queen Street, 10110, Auckland, New Zealand
Nikola Kasabov
Department of Electrical and Computer Engineering, Robotics Laboratory, Auckland University of Technology, 38 Princes Street, 1142, Auckland, New Zealand
George Coghill

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Wang, D. (2009). An Improved Genetic Algorithm for DNA Motif Discovery with Public Domain Information. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02490-0_64

Download citation

DOI: https://doi.org/10.1007/978-3-642-02490-0_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02489-4
Online ISBN: 978-3-642-02490-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics