Skip to main content

An Improved Genetic Algorithm for DNA Motif Discovery with Public Domain Information

  • Conference paper
Advances in Neuro-Information Processing (ICONIP 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5506))

Included in the following conference series:

Abstract

Recognition of transcription factor binding sites (TFBSs or DNA motifs) to help with understanding the regulation of gene expression is one of the major challenges in the post-genomics era. Computational approaches have been developed to perform binding sites discovery based on brute-force search techniques or heuristic search algorithms, and numbers of them have achieved some degrees of success. However, the prediction accuracy of the algorithm can be relatively influenced by the natural low signal-to-noise ratio of the DNA sequence. In this paper, a novel DNA motif discovery approach using a genetic algorithm is proposed to explore the ways to improve the algorithm performance. We take account of the publicly available motif models such as Position Frequency Matrix (PFM) to initialize the population. By considering both conservation and complexity of the DNA motifs, a novel fitness function is developed to better evaluate the motif models during the evolution process. A final model refinement process is also introduced for optimizing the motif models. The experimental results demonstrate a comparable (superior) performance of our approach to recently proposed two genetic algorithm motif discovery approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)

    Google Scholar 

  2. Chan, T.-M., Leung, K.-S., Lee, K.-H.: TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24, 341–349 (2008)

    Article  Google Scholar 

  3. Congdon, C.B., Aman, J.C., Nava, G.M., Gaskins, H.R., Mattingly, C.J.: An Evaluation of Information Content as a Metric for the Inference of Putative Conserved Noncoding Regions in DNA Sequences Using a Genetic Algorithms Approach. IEEE/ACM Trans. on Computational Biology and Bioinformatics 5, 1–14 (2008)

    Article  Google Scholar 

  4. Galas, D.J., Schmitz, A.: DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 5, 3157–3170 (1978)

    Article  Google Scholar 

  5. Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33, 4899–4913 (2005)

    Article  Google Scholar 

  6. Huerta, A.M., Salgado, H., Thieffry, D., Collado-Vides, J.: RegulonDB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res. 26, 55–59 (1998)

    Article  Google Scholar 

  7. Li, L.P., Liang, Y., Bass, R.L.L.: GAPWM: A Genetic Algorithm Method for Optimizing a Position Weight Matrix. Bioinformatics 23, 1188–1194 (2007)

    Article  Google Scholar 

  8. Liu, X.S., Brutlag, D.L., Liu, J.S.: An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiments. Nature Biotechnology 20, 835–839 (2002)

    Article  Google Scholar 

  9. Mahony, S., Hendrix, D., Golden, A., Smith, T.J., Rokhsar, D.S.: Transcription factor binding site identification using the Self-Organizing Map. Bioinformatics 21, 1807–1814 (2005)

    Article  Google Scholar 

  10. Shaw Jr., W.M., Burgin, R., Howell, P.: Performance standards and evaluations in ir test collections: cluster-based retrieval models. Information Processing & Management 33, 1–14 (1997)

    Article  Google Scholar 

  11. Stormo, G.D., Fields, D.S.: Specificity, free energy and information content in protein-DNA interactions. Trends in Biochemical Sciences 23, 109–113 (1998)

    Article  Google Scholar 

  12. Tompa, M., Li, N., Bailey, T.L., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)

    Article  Google Scholar 

  13. van Helden, J., André, B., Collado-Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology 281, 827–842 (1998)

    Article  Google Scholar 

  14. Wei, Z., Jensen, S.T.: GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22, 1577–1584 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, X., Wang, D. (2009). An Improved Genetic Algorithm for DNA Motif Discovery with Public Domain Information. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02490-0_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02490-0_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02489-4

  • Online ISBN: 978-3-642-02490-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics