Skip to main content

Moitf GibbsGA: Sampling Transcription Factor Binding Sites Coupled with PSFM Optimization by GA

  • Conference paper
Advances in Computation and Intelligence (ISICA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5821))

Included in the following conference series:

  • 1354 Accesses

Abstract

Identification of transcription factor binding sites (TFBSs) or motifs plays an important role in deciphering the mechanisms of gene regulation. Although many experimental and computational methods have been developed, finding TFBSs remains a challenging problem. We propose and develop a novel sampling based motif finding method coupled with PSFM optimization by genetic algorithm, which we call Motif GibbsGA. One significant feature of Motif GibbsGA is the combination of Gibbs sampling and PSFM optimization by genetic algorithm. Based on position-specific frequency matrix (PSFM) motif model, a greedy strategy for choosing the initial parameters of PSFM is employed. Then a Gibbs sampler is built with respect to PSFM model. During the sampling process, PSFM is improved via a genetic algorithm. A post-processing with adaptive adding and removing is used to handle general cases with arbitrary numbers of instances per sequence. We test our method on the benchmark dataset compiled by Tompa et al. for assessing computational tools that predict TFBSs. The performance of Motif GibbsGA on the data set compares well to, and in many cases exceeds, the performance of existing tools. This is in part attributed to the significant role played by the genetic algorithm which has improved PSFM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)

    Google Scholar 

  2. Ao, W., Gaudet, J., Kent, W.J., Muttumu, S., Mango, S.E.: Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305, 1743–1746 (2004)

    Article  Google Scholar 

  3. Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis-regulatory elements associated with functionally coherent groups of genes in Saccharomyeds cerevisiae. J.Mol.Biol. 296, 1205–1214 (2000)

    Article  Google Scholar 

  4. Liu, X., Brutlag, D.L., Liu, J.S.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput. vol. 6, pp. 127–138 (2001)

    Google Scholar 

  5. Thijs, G., et al.: A Gibbs sampling methods to detect overrepresented motifs in the upstream regions of co-expressed genes. J. Comput. Biol. 9, 447–464 (2002)

    Article  Google Scholar 

  6. Frith, M.C., Hansen, U., Spouge, J.L., Weng, Z.: Finding functional sequence elements by multiple local alignment. Nucleic Acids Research 32, 189–200 (2004)

    Article  Google Scholar 

  7. Liang, K.C., Wang, X.D., Anastassiou, D.: A profile-based deterministic sequential Monte Carlo algorithm for motif discovery. Bioinformatics 24, 46–55 (2008)

    Article  Google Scholar 

  8. Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)

    Article  Google Scholar 

  9. Wei., Z., Jensen, S.T.: GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22, 1577–1584 (2006)

    Article  Google Scholar 

  10. Chan, T.M., Leung, K.S., Lee, K.H.: TFBS identification based on genetic algorithm with combined representations and adapbive post-processing. Bioinformatics 24, 341–349 (2008)

    Article  Google Scholar 

  11. Frith., M.C., Fu., Y., Yu, L., et al.: Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Research 32, 1372–1381 (2004)

    Article  Google Scholar 

  12. Mahony, S., Hendrix, D., Golden, A., Smith, T.J., Rokhsar, D.S.: Transcription factor binding site identification using the self-organizing map. Bioinformatics 21, 1807–1814 (2005)

    Article  Google Scholar 

  13. Tompa, M., Li, N., Bailey, T.L., Chruch, G.M., De Moor, B., Eskin, E.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)

    Article  Google Scholar 

  14. Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Research 33, 4899–4913 (2005)

    Article  Google Scholar 

  15. Wijaya., E., Yiu., S.-M., Son, N.T., et al.: MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders. Bioinformatics 24, 2288–2295 (2008)

    Article  Google Scholar 

  16. Li., L., Liang., Y., Bass, R.L.: GAPWM: a genetic algorithm method for optimizing a position weight matrix. Bioinformatics 23, 1188–1194 (2007)

    Article  Google Scholar 

  17. Bailey, T.L., Gribskov, M.: Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 48–54 (1998)

    Article  Google Scholar 

  18. Lawrence, C.E., et al.: Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  19. da Fonseca., P.G.S., Gautier, C., Guimaraes, K.S., Sagot, M.-F.: Efficient representation and P-value computation for high-order Markov motifs. Bioinformatics 24, i160–i166 (2008)

    Article  Google Scholar 

  20. Casimiro, A.C., Vinga, S., Freitas, A.T., Oliveira, A.L.: An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance. BMC Bioinformatics 9, 89 (2008)

    Google Scholar 

  21. Shen, L., Liu, J., Wang, W.: GBNet: Deciphering regulatory rules in the co-regulated genes using a Gibbs sampler enhanced Bayesian network approach. BMC Bioinformatics 9, 395 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, L., Jiao, L. (2009). Moitf GibbsGA: Sampling Transcription Factor Binding Sites Coupled with PSFM Optimization by GA. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds) Advances in Computation and Intelligence. ISICA 2009. Lecture Notes in Computer Science, vol 5821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04843-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04843-2_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04842-5

  • Online ISBN: 978-3-642-04843-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics