Abstract
Identification of transcription factor binding sites (TFBSs) or motifs plays an important role in deciphering the mechanisms of gene regulation. Although many experimental and computational methods have been developed, finding TFBSs remains a challenging problem. We propose and develop a novel sampling based motif finding method coupled with PSFM optimization by genetic algorithm, which we call Motif GibbsGA. One significant feature of Motif GibbsGA is the combination of Gibbs sampling and PSFM optimization by genetic algorithm. Based on position-specific frequency matrix (PSFM) motif model, a greedy strategy for choosing the initial parameters of PSFM is employed. Then a Gibbs sampler is built with respect to PSFM model. During the sampling process, PSFM is improved via a genetic algorithm. A post-processing with adaptive adding and removing is used to handle general cases with arbitrary numbers of instances per sequence. We test our method on the benchmark dataset compiled by Tompa et al. for assessing computational tools that predict TFBSs. The performance of Motif GibbsGA on the data set compares well to, and in many cases exceeds, the performance of existing tools. This is in part attributed to the significant role played by the genetic algorithm which has improved PSFM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)
Ao, W., Gaudet, J., Kent, W.J., Muttumu, S., Mango, S.E.: Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305, 1743–1746 (2004)
Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis-regulatory elements associated with functionally coherent groups of genes in Saccharomyeds cerevisiae. J.Mol.Biol. 296, 1205–1214 (2000)
Liu, X., Brutlag, D.L., Liu, J.S.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput. vol. 6, pp. 127–138 (2001)
Thijs, G., et al.: A Gibbs sampling methods to detect overrepresented motifs in the upstream regions of co-expressed genes. J. Comput. Biol. 9, 447–464 (2002)
Frith, M.C., Hansen, U., Spouge, J.L., Weng, Z.: Finding functional sequence elements by multiple local alignment. Nucleic Acids Research 32, 189–200 (2004)
Liang, K.C., Wang, X.D., Anastassiou, D.: A profile-based deterministic sequential Monte Carlo algorithm for motif discovery. Bioinformatics 24, 46–55 (2008)
Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)
Wei., Z., Jensen, S.T.: GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22, 1577–1584 (2006)
Chan, T.M., Leung, K.S., Lee, K.H.: TFBS identification based on genetic algorithm with combined representations and adapbive post-processing. Bioinformatics 24, 341–349 (2008)
Frith., M.C., Fu., Y., Yu, L., et al.: Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Research 32, 1372–1381 (2004)
Mahony, S., Hendrix, D., Golden, A., Smith, T.J., Rokhsar, D.S.: Transcription factor binding site identification using the self-organizing map. Bioinformatics 21, 1807–1814 (2005)
Tompa, M., Li, N., Bailey, T.L., Chruch, G.M., De Moor, B., Eskin, E.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)
Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Research 33, 4899–4913 (2005)
Wijaya., E., Yiu., S.-M., Son, N.T., et al.: MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders. Bioinformatics 24, 2288–2295 (2008)
Li., L., Liang., Y., Bass, R.L.: GAPWM: a genetic algorithm method for optimizing a position weight matrix. Bioinformatics 23, 1188–1194 (2007)
Bailey, T.L., Gribskov, M.: Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 48–54 (1998)
Lawrence, C.E., et al.: Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262, 208–214 (1993)
da Fonseca., P.G.S., Gautier, C., Guimaraes, K.S., Sagot, M.-F.: Efficient representation and P-value computation for high-order Markov motifs. Bioinformatics 24, i160–i166 (2008)
Casimiro, A.C., Vinga, S., Freitas, A.T., Oliveira, A.L.: An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance. BMC Bioinformatics 9, 89 (2008)
Shen, L., Liu, J., Wang, W.: GBNet: Deciphering regulatory rules in the co-regulated genes using a Gibbs sampler enhanced Bayesian network approach. BMC Bioinformatics 9, 395 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, L., Jiao, L. (2009). Moitf GibbsGA: Sampling Transcription Factor Binding Sites Coupled with PSFM Optimization by GA. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds) Advances in Computation and Intelligence. ISICA 2009. Lecture Notes in Computer Science, vol 5821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04843-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-04843-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04842-5
Online ISBN: 978-3-642-04843-2
eBook Packages: Computer ScienceComputer Science (R0)