Skip to main content

A New Method Combining DNA Shape Features to Improve the Prediction Accuracy of Transcription Factor Binding Sites

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12464))

Included in the following conference series:

Abstract

Identifying transcription factor (TF) binding sites (TFBSs) has play an important role in the computational inference of gene regulation. With the development of high-throughput technologies, there have been many conventional methods and deep learning models used in the identification of TFBSs. However, most methods are designed to predict TFBSs only based on raw DNA sequence leads to low accuracy. Therefore, we propose a Dual-channel Convolutional neural network (CNN) model combining DNA sequences and DNA Shape features to predict TFBSs, named DCDS. In the DCDS model, the convolution layer captures low-level features from input data and parallel pooling operations are used to find the most significant activation signal in a sequence for each filter to improve the prediction accuracy of TFBSs. We conduct a series of experiments on 66 in vitro datasets and experimental results show that proposed model DCDS is superior to some state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Brand, L.H., Fischer, N.M., Harter, K., Kohlbacher, O., Wanke, D.: Elucidating the evolutionary conserved DNA-binding specificities of WRKY transcription factors by molecular dynamics and in vitro binding assays. Nucleic Acids Res. 41, 9764–9778 (2013)

    Article  Google Scholar 

  2. Weirauch, M.T., et al.: Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126–134 (2013)

    Article  Google Scholar 

  3. Zheng, C.-H., Zhang, L., Ng, V.T., Shiu, C.K., Huang, D.S.: Molecular pattern discovery based on penalized matrix decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 8, 1592–1603 (2011)

    Article  Google Scholar 

  4. Zheng, C., Huang, D.S., Zhang, L., Kong, X.: Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans. Inf. Technol. Biomed. 13, 599–607 (2009)

    Article  Google Scholar 

  5. Huang, D.S., Zhang, L., Han, K., Deng, S.P., Yang, K., Zhang, H.B.: Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci. 15, 553–560 (2014)

    Article  Google Scholar 

  6. Huang, D.S., Yu, H.-J.: Normalized feature vectors: a novel alignment-free sequence comparison method based on the numbers of adjacent amino acids. IEEE/ACM Trans. Comput. Biol. Bioinf. 10, 457–467 (2013)

    Article  Google Scholar 

  7. Huang, D.S., Zhao, X.M., Huang, G.B., Cheung, Y.M.: Classifying protein sequences using hydropathy blocks. Pattern Recogn. 39, 2293–2300 (2006)

    Article  Google Scholar 

  8. Huang, D.S., Huang, X.: Improved performance in protein secondary structure prediction by combining multiple predictions. Protein Pept. Lett. 13, 985–991 (2006)

    Article  MathSciNet  Google Scholar 

  9. Huang, D.S., Zheng, C.: Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinf. (Oxford, Engl.) 22, 1855–1862 (2006)

    Article  Google Scholar 

  10. Deng, S.P., Zhu, L., Huang, D.S.: Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks. BMC Genomics 16(Suppl 3), S4 (2015)

    Article  Google Scholar 

  11. Coppe, A., et al.: Motif discovery in promoters of genes co-localized and co-expressed during myeloid cells differentiation. Nucleic Acids Res. 37, 533–549 (2009)

    Article  Google Scholar 

  12. Stormo, G.D.: Modeling the specificity of protein-DNA interactions. Quant. Biol. (Beijing, China) 1, 115–130 (2013)

    Google Scholar 

  13. Fletez-Brant, C., Lee, D., McCallion, A.S., Beer, M.A.: kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res. 41, 544–556 (2013)

    Article  Google Scholar 

  14. Huang, D.S., Songde, M.: A new radial basis probabilistic neural network model (1996)

    Google Scholar 

  15. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Li, F.F.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  16. Deng, L.: Deep learning for natural language processing and related applications (Tutorial at ICASSP) (2014)

    Google Scholar 

  17. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 125–223 (2012)

    Google Scholar 

  18. Chen, C., Seff, A., Kornhauser, A., Xiao, J.: DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving (2015)

    Google Scholar 

  19. Deng, S.P., Lin, Z., Huang, D.S.: Predicting hub genes associated with cervical cancer through gene co-expression networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 13, 27–35 (2016)

    Article  Google Scholar 

  20. Zhu, L., Deng, S.P., Huang, D.S.: A two-stage geometric method for pruning unreliable links in protein-protein networks. IEEE Trans. Nanobiosci. 14, 528–534 (2015)

    Article  Google Scholar 

  21. Lin, Z., You, Z.H., Huang, D.S., Bing, W., Deane, C.M.: t-LSE: a novel robust geometric approach for modeling protein-protein interaction networks. PLoS One 8, e58368 (2013)

    Article  Google Scholar 

  22. Huang, D.S., Jiang, W.: A general cpl-ads methodology for fixing dynamic parameters in dual environments. IEEE Trans. Syst. Man Cybern. Part B Cybern. Publ. IEEE Syst. Man Cybern. Soc. 42, 1489–1500 (2012)

    Article  Google Scholar 

  23. Zhu, L., Guo, W.-L., Deng, S.P., Huang, D.S.: ChIP-PIT: enhancing the analysis of ChIP-Seq data using convex-relaxed pair-wise tensor decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 13, 55–63 (2015)

    Article  Google Scholar 

  24. Huang, D.S., Du, J.-X.: A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks. IEEE Trans. Neural Networks 19, 2099–2115 (2008)

    Article  Google Scholar 

  25. Xia, J.F., Zhao, X.M., Song, J., Huang, D.S.: APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinf. 11, 174 (2010)

    Article  Google Scholar 

  26. Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015)

    Article  Google Scholar 

  27. Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015)

    Article  Google Scholar 

  28. Kelley, D.R., Snoek, J., Rinn, J.L.: Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016)

    Article  Google Scholar 

  29. Quang, D., Xie, X.: DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016)

    Article  Google Scholar 

  30. Hassanzadeh, H.R., Wang, M.D.: DeeperBind: Enhancing prediction of sequence specificities of DNA binding proteins. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 178–183 (2016)

    Google Scholar 

  31. Gordân, R., et al.: Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 3, 1093–1104 (2013)

    Article  Google Scholar 

  32. Zhou, T., et al.: Quantitative modeling of transcription factor binding specificities using DNA shape. Proc. Natl. Acad. Sci. U.S.A. 112, 4654–4659 (2015)

    Article  Google Scholar 

  33. Yang, L., et al.: TFBSshape: a motif database for DNA shape features of transcription factor binding sites. Nucleic Acids Res. 42, 148–155 (2014)

    Article  Google Scholar 

  34. Ma, W., Yang, L., Rohs, R., Noble, W.S.: DNA sequence + shape kernel enables alignment-free modeling of transcription factor binding. Bioinf. (Oxford, Engl.) 33, 3003–3010 (2017)

    Article  Google Scholar 

  35. Zhang, Q., Shen, Z., Huang, D.S.: Predicting in-vitro transcription factor binding sites using DNA sequence + shape. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics (2019). https://doi.org/10.1109/tcbb.2019.2947461

  36. Rohs, R., West, S.M., Sosinsky, A., Peng, L., Honig, B.: The role of DNA shape in protein-DNA recognition. Nature 461, 1248–1253 (2009)

    Article  Google Scholar 

  37. Zhou, T., Yang, L., Lu, Y., Dror, I., Dantas Machado, A.C., Ghane, T., Di Felice, R., Rohs, R.: DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res. 41, 56–62 (2013)

    Article  Google Scholar 

  38. Abadi, M.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (2016) arxiv 1603

    Google Scholar 

  39. Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for svm protein classification. Pacif. Symp. Biocomput. Pacif. Symp. Biocomput. 7, 564–575 (2002)

    Google Scholar 

  40. Agius, P., Arvey, A., Chang, W., Noble, W.S., Leslie, C.: High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions. PLoS Comput. Biol. 6, e1000916 (2010)

    Article  Google Scholar 

  41. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436 (2015)

    Article  Google Scholar 

  42. Deng, S., Yuan, J., Huang, D.S., Zhen, W.: SFAPS: An R package for structure/function analysis of protein sequences based on informational spectrum method. IEEE Int. Conf. Bioinf. Biomed. 3, 207–212 (2013)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the grant of National Key R&D Program of China (No. 2018YFA0902600 & 2018AAA0100100) and partly supported by National Natural Science Foundation of China (Grant nos. 61861146002, 61520106006, 61772370, 61702371, 61732012, 61932008, 61532008, 61672382, 61772357, and 61672203) and China Postdoctoral Science Foundation (Grant no. 2017M611619) and supported by “BAGUI Scholar” Program and the Scientific & Technological Base and Talent Special Program, GuiKe AD18126015 of the Guangxi Zhuang Autonomous Region of China and supported by Shanghai Municipal Science and Technology Major Project (No.2018SHZDZX01), LCNBI and ZJLab.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinhu Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S. et al. (2020). A New Method Combining DNA Shape Features to Improve the Prediction Accuracy of Transcription Factor Binding Sites. In: Huang, DS., Jo, KH. (eds) Intelligent Computing Theories and Application. ICIC 2020. Lecture Notes in Computer Science(), vol 12464. Springer, Cham. https://doi.org/10.1007/978-3-030-60802-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60802-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60801-9

  • Online ISBN: 978-3-030-60802-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics