Abstract
Transcription factors (TFs) have a great effect on gene transcription process. TFs can boost the formation of complex gene expression regulation system by promoting or inhibiting gene binding to DNA, which is called as TF binding sites (TFBSs). Recent years have seen the rapid development deep learning (DL) method in natural language processing (NLP), computer vision (CV) and these methods outperform than the state-of-the-art method. Many scholars applied these methods to motif discovery, e.g., DeepBind and DenQ. But these methods only use the raw DNA sequence as input data. Instead of improving complex model, massive biological data brought by high-throughput sequencing technology provides a different idea. In this paper, we propose a simple and effective DL-based model, namely DeepCR, integrating multiple-omics data to predict TFBSs. Experiments on 21 motif datasets of GM12878 cell line from in-vitro protein binding microarray data show that multiple-omics data can significantly improve the overall performance. More specifically, the average AUC is improved by 3.89% for histone modifications, and 3.77% for MeDIP-seq respectively, and 6.63% for histone modifications and MeDIP-seq together. And the mean AR is increased by 3.90% for histone modifications, and 4.50% for MeDIP-seq respectively, and 6.00% for histone modifications and MeDIP-seq together.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lambert, S.A., et al.: The human transcription factors. Cell 175(2), 598–599 (2018)
Teixeira, J.R., Szeto, R.A., Carvalho, V.M.A., et al.: Transcription factor 4 and its association with psychiatric disorders. Transl. Psychiatry 11(1), 1–12 (2021)
Wu, Q., Li, W., You, C.: The regulatory roles and mechanisms of the transcription factor FOXF2 in human diseases. PeerJ 9, e10845 (2021)
Tianyin, Z., Ning, S., et al. Quantitative modeling of transcription factor binding specificities using DNA shape. In: Proceedings of the National Academy of Sciences, pp. 112–115 (2015)
Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nat. Methods 5(1), 16–18 (2008)
Stormo, G.D., Zhao, Y.: Determining the specificity of protein–DNA interactions. Nat. Rev. Genet. 11(11), 751–760 (2010)
Bi, Y., Kim, H., Gupta, R., et al.: Tree-based position weight matrix approach to model transcription factor binding site profiles. PLoS One 6(9), e24210 (2011)
Giaquinta, E., Grabowski, S., Ukkonen, E.: Fast matching of transcription factor motifs using generalized position weight matrix models. J. Comput. Biol. 20(9), 621–630 (2013)
Fletez-Brant, C., Lee, D., McCallion, A.S., et al.: kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res. 41(W1), W544–W556 (2013)
Ghandi, M., Lee, D., Mohammad-Noori, M., et al.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10(7), e1003711 (2014)
Lee, D.: LS-GKM: a new gkm-SVM for large-scale datasets. Bioinformatics 32(14), 2196–2198 (2016)
Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015)
Jian, Z., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12(10), 931–934 (2015)
Zhang, Q., Zhu, L., Bao, W., Huang, D.-S.: Weakly-supervised convolutional neural network architecture for predicting protein-DNA binding. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(2), 679–689 (2020)
Zhang, Q., Zhu, L., Huang, D.-S.: High-order convolutional neural network architecture for predicting DNA-protein binding sites. IEEE/ACM Trans. Comput. Biol. Bioinform. 16(4), 1184–1192 (2019)
Zhang, Q., Shen, Z., Huang, D.-S.: Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network. Sci Rep. 9(1), 8484 (2019)
Zhang, H., Zhu, L., Huang, D.S.: DiscMLA: an efficient discriminative motif learning algorithm over high-throughput datasets. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(6), 1810–1820 (2018)
Zhu, L., Zhang, H., Huang, D.S.: LMMO: a large margin approach for optimizing regulatory motifs. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 15(3), 913–925 (2018)
Ritambhara, S., Lanchantin, J., et al.: DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32, i639–i648 (2016)
Weirauch, M.T., Cote, A., Norel, R., et al.: Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31(2), 126–134 (2013)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2), 1137–1145 (1995)
Wang, J., Huang, P., Zhao, H., Zhang, Z., Zhao, B., Lee, D.L.: Billion-scale commodity embedding for E-commerce recommendation in Alibaba. In: Knowledge Discovery and Data Mining, pp. 839–848 (2018)
Zhu, L., Guo, W.-L., Huang, D.-S., Lu, C.-Y.: Imputation of ChIP-seq datasets via low rank convex co-embedding. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 141–144 (2015)
Wang, D., Zhang, Q., Yuan, C.-A., Qin, X., Huang, Z.-K., Shang, L.: Motif discovery via convolutional networks with K-mer embedding. In: Huang, D.-S., Jo, K.-H., Huang, Z.-K. (eds.) ICIC 2019. LNCS, vol. 11644, pp. 374–382. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26969-2_36
Zhu, L., Guo, W.-L., Huang, D.-S., Lu, C.-Y.: Imputation of ChIP-seq datasets via Low Rank Convex Co-Embedding. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 141–144 (2015)
Wenxuan, X., Zhu, L., Huang, D.-S.: DCDE: an efficient deep convolutional divergence encoding method for human promoter recognition. IEEE Trans. Nanobiosci. 18(2), 136–145 (2019)
Zhang, Q., Shen, Z., Huang, D.-S.: Predicting in-vitro transcription factor binding sites using DNA sequence + shape. IEEE/ACM Trans. Comput. Biol. Bioinform. 18(2), 667–676 (2021)
Wang, S., He, Y., Chen, Z., Zhang, Q.: FCNGRU: locating transcription factor binding sites by combing fully convolutional neural network with gated recurrent unit. IEEE J. Biomed. Health Inform. 26(4), 1883–1890 (2022)
Shen, Z., Zhang, Q., Han, K., Huang, D.-S.: A deep learning model for RNA-protein binding preference prediction based on hierarchical LSTM and attention network. IEEE/ACM Trans. Comput. Biol. Bioinform 19(2), 753–762
Shen, Z., Deng, S.-P., Huang, D.-S.: Capsule network for predicting RNA-protein binding preferences using hybrid feature. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(5), 1483–1492 (2020)
Shen, Z., Deng, S.-P., Huang, D.-S.: RNA-protein binding sites prediction via multi scale convolutional gated recurrent unit networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(5), 1741–1750 (2020)
Shen, Z., Bao, W., Huang, D.-S.: Recurrent neural network for predicting transcription factor binding sites. Sci. Rep. 8(1), 15270 (2018)
Shen, Z., Zhang, Y.-H., Han, K., Nandi, A.K., Honig, B., Huang, D.-S.: miRNA-disease association prediction with collaborative matrix factorization. Complexity 2017(2017), 1–9 (2017)
Acknowledgements
This work was supported by the grant of National Key R&D Program of China (No. 2018YFA0902600 & 2018AAA0100100) and partly supported by National Natural Science Foundation of China (Grant nos. 61732012, 62002266, 61932008, and 62073231), and Introduction Plan of High-end Foreign Experts (Grant no. G2021033002L) and, respectively, supported by the Key Project of Science and Technology of Guangxi (Grant no. 2021AB20147), Guangxi Natural Science Foundation (Grant nos. 2021JJA170204 & 2021JJA170199) and Guangxi Science and Technology Base and Talents Special Project (Grant nos. 2021AC19354 & 2021AC19394).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Y., Yuan, C., Wu, H., Zhao, X. (2022). Using Deep Learning to Predict Transcription Factor Binding Sites Based on Multiple-omics Data. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13393. Springer, Cham. https://doi.org/10.1007/978-3-031-13870-6_65
Download citation
DOI: https://doi.org/10.1007/978-3-031-13870-6_65
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13869-0
Online ISBN: 978-3-031-13870-6
eBook Packages: Computer ScienceComputer Science (R0)