Abstract
The regulation of gene expression is the key of organism genetic mechanism. Motif identification is an important step in constructing expression regulatory network. Based on Gibbs sampling method, this work constructed position weight matrix, thereby proposing motif recognition method based on genetic algorithm. Scoring function is defined to update the population and obtain the convergence matrix of position weight, achieving the identification of motifs with different length. Simulation and experimental data sets were utilized to verify the accuracy and execution time of the algorithm.









Similar content being viewed by others
References
D’heaseleer, P.: What are DNA sequence motifs. Natl. Biotechnol. 24(4), 423–425 (2006)
Latchman, D.S.: Transcription Factors: A Practical Approach. Oxford University Press, Oxford (1993)
Wu, B., et al.: Identify target genes involved in transcription factor GCF2 that promotes cell migration in tumor cell BEL-7404. Genomics Appl. Biol. 34(1), 35–40 (2015)
Haruka, O., Wataru, I.: MOCCS: clarifying DNA-binding motif ambiguity using ChIP-Seq data. Comput. Biol. Chem. 63, 62–72 (2016)
Bussemaker, H.J., Li, H., Siggia, E.D.: Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc. Natl. Acad. Sci. USA 97(18), 10096–10100 (2000)
Sinha, S., Tompa, M.: Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 30(24), 5549–5560 (2002)
Sinha, S., Tompa, M.: YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31(13), 3586–3588 (2003)
Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. J. Comput. Biol. 5, 279–305 (1998)
Du, Y.H., Wang, Z.Z.: Review on computational prediction of transcription factor blinding sites. Life Sci. Res. 10(2), 24–31 (2006)
Li, T.T., Jiang, B., Wang, X.W.: Tutorial for computational analysis of transcription factor binding sites. Acta Biophys. Sin. 24(5), 334–347 (2008)
Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8), 563–577 (1999)
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S.: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011)
Lawrence, C., Altschul, S.H.: Combinatorial approaches to finding subtle signals in DNA sequence. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB-2000), pp. 269–278. AAAI Press, San Diego (2000)
Neuwald, A.F., Liu, J.S., Lawrence, C.E.: Gibbs motif sampling: detection of bacterial outer membrane repeats. Protein Sci. 4(8), 1618–1632 (1995)
Surujon, D., Ratner, D.I.: Use of a probabilistic motif search to identify histidine phosphotransfer domain-containing proteins. PLoS ONE 11, 1–18 (2016)
Stine, M.: Motif discovery in upstream sequences of coordinately expressed genes. In: Proceedings of the CEC’03, pp. 1596–1603. [s. n.], Memphis (2003)
Liu, F.F.M.: FMGA: finding motifs by genetic algorithm. In: Proceedings of the BIBE’04, pp. 459–466. IEEE Press, Taichung (2004)
Che, D.S.: MDGA: motif discovery using a genetic algorithm. In: Proceedings of the Conference on Genetic and Evolutionary Computation, pp. 447–452. [s. n.], Washington D.C. (2005)
Congdon, C.B.: Preliminary results for GAMI: a genetic algorithms approach to motif inference. In: Proceedings of the Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE Press, [S. l.] (2005)
Paul, T.K., Iba, H.: Identification of weak motifs in multiple biological sequences using genetic algorithm. In: Proceedings of the GECCO’06, pp. 271–278. [s. n.], Seattle (2006)
Zhang, F., Tan, J., Xie, J.B.: Comparison, analysis and optimization of motif finding based on different algorithms. Comput. Eng. 35(22), 94–96 (2009)
Watson, J.D., Crick, F.H.C.: A structure for DNA. Nature 171, 737–738 (1953)
Vaidyanathan, P.P.: Genomics and proteomics: a signal processor’s tour. Circuits Syst. 4(4), 6–29 (2004)
Lenhard, B., Wasserman, W.W.: TFBS: computational framework for transcription factor binding sites analysis. Bioinform. Appl. Note 18(8), 1135–1136 (2002)
Hou, L., Qian, M.P., Zhu, Y.P.: Advances on bioinformatic research in transcription factor binding sites. HEREDITAS 31(4), 365–373 (2009)
Waterman, M.S., Arratia, R., Galas, D.J.: Pattern recognition in several sequences: consensus and alignment. Bull. Math. Biol. 46, 515–527 (1984)
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)
Crooks, G.E., Hon, G., Chandonia, J.M., et al.: Web Logo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004)
Schuster, B., Schultz, J., Rahmann, S.: HMM logos for visualization of protein families. BMC Bioinform. 5, 7 (2004)
Kok, W.Y., Oon, Y.B., Lee, N.K.: Perception enhancement using visual attributes in sequence motif visualization. BioRxiv 31, 1–8 (2016). doi:10.1101/066928
Tang, Z.G., Yang, B.R., Yang, J.: New outlier detection algorithm based on Markov chain. Syst. Eng. Electron. 32(12), 2721–2724 (2010)
Hughes, J., Estep, P., Tavazoie, S., Church, G.: Computational identification of Cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296(5), 1205–1214 (2000)
Martin, T., Nan, L., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)
Zhou, Qingyuan: Research on heterogeneous data integration model of group enterprise based on cluster computing. Clust. Comput. 19(3), 1275–1282 (2016)
Zhou, Q., Luo, J.: Artificial neural network based grid computing of E-government scheduling for emergency management. Comput. Syst. Sci. Eng. 30(5), 327–335 (2015)
Xu, Z., Zhang, H., Hu, C., Mei, L., Xuan, J., Choo, K.R., Sugumaran, V., Zhu, Y.: Building knowledge base of urban emergency events based on crowdsourcing of social media. Concurr. Comput.: Pract. Exp. 28(15), 4038–4052 (2016)
Xu, Z., Zhang, H., Sugumaran, V., Choo, K.R., Mei, L., Zhu, Y.: Participatory sensing-based semantic and spatial analysis of urban emergency events using mobile social media. EURASIP J. Wireless Commun. Netw. 2016, 44 (2016)
Xu, Z., Hu, C., Mei, L.: Video structured description technology based intelligence analysis of surveillance videos for public security applications. Multimedia Tools Appl. 75(19), 12155–12172 (2016)
Xu, Z., Wei, X., Liu, Y., Mei, L., Hu, C., Choo, K.R., Zhu, Y., Sugumaran, V.: Building the search pattern of web users using conceptual semantic space model. IJWGS 12(3), 328–347 (2016)
Xu, Z., Mei, L., Hu, C., Liu, Y.: The big data analytics and applications of the surveillance system using video structured description technology. Clust. Comput. 19(3), 1283–1292 (2016)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sheng, X., Wang, K. Motif identification method based on Gibbs sampling and genetic algorithm. Cluster Comput 20, 33–41 (2017). https://doi.org/10.1007/s10586-016-0699-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0699-x