Skip to main content

Using Deep Learning to Predict Transcription Factor Binding Sites Based on Multiple-omics Data

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13393))

Included in the following conference series:

  • 1571 Accesses

Abstract

Transcription factors (TFs) have a great effect on gene transcription process. TFs can boost the formation of complex gene expression regulation system by promoting or inhibiting gene binding to DNA, which is called as TF binding sites (TFBSs). Recent years have seen the rapid development deep learning (DL) method in natural language processing (NLP), computer vision (CV) and these methods outperform than the state-of-the-art method. Many scholars applied these methods to motif discovery, e.g., DeepBind and DenQ. But these methods only use the raw DNA sequence as input data. Instead of improving complex model, massive biological data brought by high-throughput sequencing technology provides a different idea. In this paper, we propose a simple and effective DL-based model, namely DeepCR, integrating multiple-omics data to predict TFBSs. Experiments on 21 motif datasets of GM12878 cell line from in-vitro protein binding microarray data show that multiple-omics data can significantly improve the overall performance. More specifically, the average AUC is improved by 3.89% for histone modifications, and 3.77% for MeDIP-seq respectively, and 6.63% for histone modifications and MeDIP-seq together. And the mean AR is increased by 3.90% for histone modifications, and 4.50% for MeDIP-seq respectively, and 6.00% for histone modifications and MeDIP-seq together.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lambert, S.A., et al.: The human transcription factors. Cell 175(2), 598–599 (2018)

    Article  Google Scholar 

  2. Teixeira, J.R., Szeto, R.A., Carvalho, V.M.A., et al.: Transcription factor 4 and its association with psychiatric disorders. Transl. Psychiatry 11(1), 1–12 (2021)

    Article  Google Scholar 

  3. Wu, Q., Li, W., You, C.: The regulatory roles and mechanisms of the transcription factor FOXF2 in human diseases. PeerJ 9, e10845 (2021)

    Article  Google Scholar 

  4. Tianyin, Z., Ning, S., et al. Quantitative modeling of transcription factor binding specificities using DNA shape. In: Proceedings of the National Academy of Sciences, pp. 112–115 (2015)

    Google Scholar 

  5. Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nat. Methods 5(1), 16–18 (2008)

    Article  Google Scholar 

  6. Stormo, G.D., Zhao, Y.: Determining the specificity of protein–DNA interactions. Nat. Rev. Genet. 11(11), 751–760 (2010)

    Article  Google Scholar 

  7. Bi, Y., Kim, H., Gupta, R., et al.: Tree-based position weight matrix approach to model transcription factor binding site profiles. PLoS One 6(9), e24210 (2011)

    Article  Google Scholar 

  8. Giaquinta, E., Grabowski, S., Ukkonen, E.: Fast matching of transcription factor motifs using generalized position weight matrix models. J. Comput. Biol. 20(9), 621–630 (2013)

    Article  MathSciNet  Google Scholar 

  9. Fletez-Brant, C., Lee, D., McCallion, A.S., et al.: kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res. 41(W1), W544–W556 (2013)

    Article  Google Scholar 

  10. Ghandi, M., Lee, D., Mohammad-Noori, M., et al.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10(7), e1003711 (2014)

    Article  Google Scholar 

  11. Lee, D.: LS-GKM: a new gkm-SVM for large-scale datasets. Bioinformatics 32(14), 2196–2198 (2016)

    Article  Google Scholar 

  12. Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015)

    Article  Google Scholar 

  13. Jian, Z., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12(10), 931–934 (2015)

    Article  Google Scholar 

  14. Zhang, Q., Zhu, L., Bao, W., Huang, D.-S.: Weakly-supervised convolutional neural network architecture for predicting protein-DNA binding. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(2), 679–689 (2020)

    Google Scholar 

  15. Zhang, Q., Zhu, L., Huang, D.-S.: High-order convolutional neural network architecture for predicting DNA-protein binding sites. IEEE/ACM Trans. Comput. Biol. Bioinform. 16(4), 1184–1192 (2019)

    Article  Google Scholar 

  16. Zhang, Q., Shen, Z., Huang, D.-S.: Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network. Sci Rep. 9(1), 8484 (2019)

    Article  Google Scholar 

  17. Zhang, H., Zhu, L., Huang, D.S.: DiscMLA: an efficient discriminative motif learning algorithm over high-throughput datasets. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(6), 1810–1820 (2018)

    Article  Google Scholar 

  18. Zhu, L., Zhang, H., Huang, D.S.: LMMO: a large margin approach for optimizing regulatory motifs. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 15(3), 913–925 (2018)

    Article  Google Scholar 

  19. Ritambhara, S., Lanchantin, J., et al.: DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32, i639–i648 (2016)

    Article  Google Scholar 

  20. Weirauch, M.T., Cote, A., Norel, R., et al.: Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31(2), 126–134 (2013)

    Article  Google Scholar 

  21. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2), 1137–1145 (1995)

    Google Scholar 

  22. Wang, J., Huang, P., Zhao, H., Zhang, Z., Zhao, B., Lee, D.L.: Billion-scale commodity embedding for E-commerce recommendation in Alibaba. In: Knowledge Discovery and Data Mining, pp. 839–848 (2018)

    Google Scholar 

  23. Zhu, L., Guo, W.-L., Huang, D.-S., Lu, C.-Y.: Imputation of ChIP-seq datasets via low rank convex co-embedding. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 141–144 (2015)

    Google Scholar 

  24. Wang, D., Zhang, Q., Yuan, C.-A., Qin, X., Huang, Z.-K., Shang, L.: Motif discovery via convolutional networks with K-mer embedding. In: Huang, D.-S., Jo, K.-H., Huang, Z.-K. (eds.) ICIC 2019. LNCS, vol. 11644, pp. 374–382. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26969-2_36

    Chapter  Google Scholar 

  25. Zhu, L., Guo, W.-L., Huang, D.-S., Lu, C.-Y.: Imputation of ChIP-seq datasets via Low Rank Convex Co-Embedding. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 141–144 (2015)

    Google Scholar 

  26. Wenxuan, X., Zhu, L., Huang, D.-S.: DCDE: an efficient deep convolutional divergence encoding method for human promoter recognition. IEEE Trans. Nanobiosci. 18(2), 136–145 (2019)

    Article  Google Scholar 

  27. Zhang, Q., Shen, Z., Huang, D.-S.: Predicting in-vitro transcription factor binding sites using DNA sequence + shape. IEEE/ACM Trans. Comput. Biol. Bioinform. 18(2), 667–676 (2021)

    Article  Google Scholar 

  28. Wang, S., He, Y., Chen, Z., Zhang, Q.: FCNGRU: locating transcription factor binding sites by combing fully convolutional neural network with gated recurrent unit. IEEE J. Biomed. Health Inform. 26(4), 1883–1890 (2022)

    Article  Google Scholar 

  29. Shen, Z., Zhang, Q., Han, K., Huang, D.-S.: A deep learning model for RNA-protein binding preference prediction based on hierarchical LSTM and attention network. IEEE/ACM Trans. Comput. Biol. Bioinform 19(2), 753–762

    Google Scholar 

  30. Shen, Z., Deng, S.-P., Huang, D.-S.: Capsule network for predicting RNA-protein binding preferences using hybrid feature. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(5), 1483–1492 (2020)

    Article  Google Scholar 

  31. Shen, Z., Deng, S.-P., Huang, D.-S.: RNA-protein binding sites prediction via multi scale convolutional gated recurrent unit networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(5), 1741–1750 (2020)

    Article  Google Scholar 

  32. Shen, Z., Bao, W., Huang, D.-S.: Recurrent neural network for predicting transcription factor binding sites. Sci. Rep. 8(1), 15270 (2018)

    Article  Google Scholar 

  33. Shen, Z., Zhang, Y.-H., Han, K., Nandi, A.K., Honig, B., Huang, D.-S.: miRNA-disease association prediction with collaborative matrix factorization. Complexity 2017(2017), 1–9 (2017)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the grant of National Key R&D Program of China (No. 2018YFA0902600 & 2018AAA0100100) and partly supported by National Natural Science Foundation of China (Grant nos. 61732012, 62002266, 61932008, and 62073231), and Introduction Plan of High-end Foreign Experts (Grant no. G2021033002L) and, respectively, supported by the Key Project of Science and Technology of Guangxi (Grant no. 2021AB20147), Guangxi Natural Science Foundation (Grant nos. 2021JJA170204 & 2021JJA170199) and Guangxi Science and Technology Base and Talents Special Project (Grant nos. 2021AC19354 & 2021AC19394).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youhong Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, Y., Yuan, C., Wu, H., Zhao, X. (2022). Using Deep Learning to Predict Transcription Factor Binding Sites Based on Multiple-omics Data. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13393. Springer, Cham. https://doi.org/10.1007/978-3-031-13870-6_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13870-6_65

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13869-0

  • Online ISBN: 978-3-031-13870-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics