Skip to main content

DeepTF: Accurate Prediction of Transcription Factor Binding Sites by Combining Multi-scale Convolution and Long Short-Term Memory Neural Network

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering. Big Data and Machine Learning (IScIDE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11936))

Abstract

Transcription factor binding site (TFBS), one of the DNA-protein binding sites, plays important roles in understanding regulation of gene expression and drug design. Recently, deep-learning based methods have been widely used in the prediction of TFBS. In this work, we propose a novel deep-learning model, called Combination of Multi-Scale Convolutional Network and Long Short-Term Memory Network (MCNN-LSTM), which utilizes multi-scale convolution for feature processing, and the long short-term memory network to recognize TFBS in DNA sequences. Moreover, we design a new encoding method, called multi-nucleotide one-hot (MNOH), which considers the correlation between nucleotides in adjacent positions, to further improve the prediction performance of TFBS. Stringent cross-validation and independent tests on benchmark datasets demonstrated the efficacy of MNOH and MCNN-LSTM. Based on the proposed methods, we further implement a new TFBS predictor, called DeepTF. The computational experimental results show that our predictor outperformed several existing TFBS predictors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lee, D., et al.: A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47(8), 955 (2015)

    Article  Google Scholar 

  2. Kharchenko, P.V., Tolstorukov, M.Y., Park, P.J.: Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26(12), 1351 (2008)

    Article  Google Scholar 

  3. Ji, H., Jiang, H., Ma, W., Johnson, D.S., Myers, R.M., Wong, W.H.: An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol. 26(11), 1293 (2008)

    Article  Google Scholar 

  4. Siggers, T., Gordân, R.: Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 42(4), 2099–2111 (2013)

    Article  Google Scholar 

  5. Fletez-Brant, C., Lee, D., McCallion, A.S., Beer, M.A.: kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res. 41(W1), W544–W556 (2013)

    Article  Google Scholar 

  6. Wong, K.C., Chan, T.M., Peng, C., Li, Y., Zhang, Z.: DNA motif elucidation using belief propagation. Nucleic Acids Res. 41(16), e153–e153 (2013)

    Article  Google Scholar 

  7. Ghandi, M., Lee, D., Mohammad-Noori, M., Beer, M.A.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10(7), e1003711 (2014)

    Article  Google Scholar 

  8. Nutiu, R., et al.: Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29(7), 659 (2011)

    Article  Google Scholar 

  9. Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8), 831 (2015)

    Article  Google Scholar 

  10. Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32(12), i121–i127 (2016)

    Article  Google Scholar 

  11. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  12. Hassanzadeh, H.R., Wang, M.D.: DeeperBind: enhancing prediction of sequence specificities of DNA binding proteins. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 178–183. IEEE (2016)

    Google Scholar 

  13. Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12(10), 931 (2015)

    Article  Google Scholar 

  14. Siebert, M., Söding, J.: Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences. Nucleic Acids Res. 44(13), 6055–6069 (2016)

    Article  Google Scholar 

  15. Salekin, S., Zhang, J.M., Huang, Y.: Base-pair resolution detection of transcription factor binding site by deep deconvolutional network. Bioinformatics 34(20), 3446–3453 (2018)

    Article  Google Scholar 

  16. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  17. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence & Statistics, AISTATS, vol. 130, p. 297 (2011)

    Google Scholar 

  18. Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2013)

    Google Scholar 

  19. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  20. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  21. Hu, J., Zhou, X., Zhu, Y.H., Yu, D.J., Zhang, G.: TargetDBP: accurate DNA-binding protein prediction via sequence-based multi-view feature learning. IEEE/ACM Trans. Comput. Biol. Bioinform. (2019)

    Google Scholar 

  22. Zhu, Y.H., Hu, J., Song, X.N., Yu, D.J.: DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J. Chem. Inf. Model. (2019)

    Google Scholar 

  23. Ren, H., Shen, Y.: RNA-binding residues prediction using structural features. BMC Bioinform. 16(1), 249 (2015)

    Article  Google Scholar 

  24. Chen, K., Mizianty, M.J., Kurgan, L.: ATPsite: sequence-based prediction of ATP-binding residues. In: Proteome Science, vol. 9, p. S4. BioMed Central (2011)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61772273, 61373062) and the Fundamental Research Funds for the Central Universities (No. 30918011104).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong-Jun Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bao, XR., Zhu, YH., Yu, DJ. (2019). DeepTF: Accurate Prediction of Transcription Factor Binding Sites by Combining Multi-scale Convolution and Long Short-Term Memory Neural Network. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Big Data and Machine Learning. IScIDE 2019. Lecture Notes in Computer Science(), vol 11936. Springer, Cham. https://doi.org/10.1007/978-3-030-36204-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36204-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36203-4

  • Online ISBN: 978-3-030-36204-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics