Skip to main content

Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10847))

Included in the following conference series:

Abstract

Knowing the transcription factor binding sites (TFBSs) is essential for modeling the underlying binding mechanisms and follow-up cellular functions. Convolutional neural networks (CNNs) have outperformed methods in predicting TFBSs from the primary DNA sequence. In addition to DNA sequences, histone modifications and chromatin accessibility are also important factors influencing their activity. They have been explored to predict TFBSs recently. However, current methods rarely take into account histone modifications and chromatin accessibility using CNN in an integrative framework. To this end, we developed a general CNN model to integrate these data for predicting TFBSs. We systematically benchmarked a series of architecture variants by changing network structure in terms of width and depth, and explored the effects of sample length at flanking regions. We evaluated the performance of the three types of data and their combinations using 256 ChIP-seq experiments and also compared it with competing machine learning methods. We find that contributions from these three types of data are complementary to each other. Moreover, the integrative CNN framework is superior to traditional machine learning methods with significant improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mitchell, P.J., Tjian, R.: Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245, 371–378 (1989)

    Article  Google Scholar 

  2. Junion, G., Spivakov, M., Girardot, C., Braun, M., Gustafson, E.H., Birney, E., Furlong, E.E.: A transcription factor collective defines cardiac cell fate and reflects lineage history. Cell 148, 473–486 (2012)

    Article  Google Scholar 

  3. Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., Luscombe, N.M.: A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009)

    Article  Google Scholar 

  4. Lee, T.I., Young, R.A.: Transcriptional regulation and its misregulation in disease. Cell 152, 1237–1251 (2013)

    Article  Google Scholar 

  5. Neph, S., Vierstra, J., Stergachis, A.B., Reynolds, A.P., Haugen, E., Vernot, B., Thurman, R.E., John, S., Sandstrom, R., Johnson, A.K.: An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012)

    Article  Google Scholar 

  6. Gilfillan, G.D., Hughes, T., Sheng, Y., Hjorthaug, H.S., Straub, T., Gervin, K., Harris, J.R., Undlien, D.E., Lyle, R.: Limitations and possibilities of low cell number ChIP-seq. BMC Genom. 13, 645 (2012)

    Article  Google Scholar 

  7. Park, P.J.: ChIP–seq: advantages and challenges of a maturing technology. Nature Rev. Genet. 10, 669–680 (2009)

    Article  Google Scholar 

  8. Warner, J.B., Philippakis, A.A., Jaeger, S.A., He, F.S., Lin, J., Bulyk, M.L.: Systematic identification of mammalian regulatory motifs’ target genes and functions. Nat. Methods 5, 347–353 (2008)

    Article  Google Scholar 

  9. Ghandi, M., Lee, D., Mohammad-Noori, M., Beer, M.A.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10, e1003711 (2014)

    Article  Google Scholar 

  10. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  11. Angermueller, C., Lee, H.J., Reik, W., Stegle, O.: DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017)

    Article  Google Scholar 

  12. Qin, Q., Feng, J.: Imputation for transcription factor binding predictions based on deep learning. PLoS Comput. Biol. 13, e1005403 (2017)

    Article  Google Scholar 

  13. Yang, B., Liu, F., Ren, C., Ouyang, Z., Xie, Z., Bo, X., Shu, W.: BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 33, 1930–1936 (2017)

    Article  Google Scholar 

  14. Kelley, D.R., Snoek, J., Rinn, J.L.: Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26(7), 990–999 (2016)

    Article  Google Scholar 

  15. Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32, i121–i127 (2016)

    Article  Google Scholar 

  16. Jurtz, V.I., Johansen, A.R., Nielsen, M., Almagro Armenteros, J.J., Nielsen, H., Sønderby, C.K., Winther, O., Sønderby, S.K.: An introduction to deep learning on biological sequence data: examples and solutions. Bioinformatics 33, 3685–3690 (2017)

    Article  Google Scholar 

  17. Liu, Q., Xia, F., Yin, Q., Jiang, R.: Chromatin accessibility prediction via a hybrid deep convolutional neural network. Bioinformatics 34(5), 732–738 (2017). https://doi.org/10.1093/bioinformatics/btx679

    Article  Google Scholar 

  18. Min, X., Zeng, W., Chen, N., Chen, T., Jiang, R.: Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding. Bioinformatics 33, i92–i101 (2017)

    Article  Google Scholar 

  19. Bu, H., Gan, Y., Wang, Y., Zhou, S., Guan, J.: A new method for enhancer prediction based on deep belief network. BMC Bioinform. 18, 418 (2017)

    Article  Google Scholar 

  20. Zhang, J., Peng, W., Wang, L.: LeNup: learning nucleosome positioning from DNA sequences with improved convolutional neural networks. Bioinformatics 34(10), 1705–1712 (2018). https://doi.org/10.1093/bioinformatics/bty003

    Article  Google Scholar 

  21. Piqueregi, R., Degner, J.F., Pai, A.A., Gaffney, D.J., Gilad, Y., Pritchard, J.K.: Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011)

    Article  Google Scholar 

  22. Xin, B., Rohs, R.: Relationship between histone modifications and transcription factor binding is protein family specific. Genome Res. (2018). https://doi.org/10.1101/gr.220079.116

  23. Min, X., Zeng, W., Chen, S., Chen, N., Chen, T., Jiang, R.: Predicting enhancers with deep convolutional neural networks. BMC Bioinform. 18, 478 (2017)

    Article  Google Scholar 

  24. Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12, 931–934 (2015)

    Article  Google Scholar 

  25. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M.: TensorFlow: a system for large-scale machine learning. In: OSDI 2016, pp. 265–283 (2016)

    Google Scholar 

  26. Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Kheradpour, P., Zhang, Z., Heravi-Moussavi, A., Liu, Y., Amin, V.: Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)

    Article  Google Scholar 

  27. Ziller, M.J., Edri, R., Yaffe, Y., Donaghey, J., Pop, R., Mallard, W., Issner, R., Gifford, C.A., Goren, A., Xing, J.: Dissecting neural differentiation regulatory networks through epigenetic footprinting. Nature 518, 355–359 (2015)

    Article  Google Scholar 

  28. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  29. Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

  30. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

Fang Jing would like to thank the support of the National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, CAS, during his visit. The work was supported by the National Natural Science Foundation of China [No. 61473232 and 91430111 to SWZ; No. 61621003 and 11661141019 to SZ]; the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) [No. XDB13040600], the Key Research Program of the Chinese Academy of Sciences, [No. KFZD-SW-219] and CAS Frontier Science Research Key Project for Top Young Scientist [No. QYZDB-SSW-SYS008].

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shao-Wu Zhang or Shihua Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jing, F., Zhang, SW., Cao, Z., Zhang, S. (2018). Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning. In: Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds) Bioinformatics Research and Applications. ISBRA 2018. Lecture Notes in Computer Science(), vol 10847. Springer, Cham. https://doi.org/10.1007/978-3-319-94968-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94968-0_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94967-3

  • Online ISBN: 978-3-319-94968-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics