DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding

Zhang, Yongqing; Qiao, Shaojie; Ji, Shengjie; Li, Yizhou

doi:10.1007/s13042-019-00990-x

DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding

Original Article
Published: 29 July 2019

Volume 11, pages 841–851, (2020)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Yongqing Zhang^1,2,
Shaojie Qiao ORCID: orcid.org/0000-0002-4703-780X³,
Shengjie Ji¹ &
…
Yizhou Li⁴

2060 Accesses
51 Citations
Explore all metrics

Abstract

Transcription factors are cis-regulatory molecules that bind to specific sub-regions of DNA promoters and initiate transcription, the process that regulates the conversion of genetic information from DNA to RNA. Several computational methods have been developed to predict DNA–protein binding sites in DNA sequence using convolutional neural network (CNN). However, these techniques could indicate the dependency information of DNA sequence information in the framework of CNN. In addition, these methods are not accurate enough in prediction of the DNA–protein binding sites from the DNA sequence. In this study, we employ the bidirectional long short-term memory (BLSTM) and CNN to capture long-term dependencies between the sequence motifs in DNA, which is called DeepSite. Apart from traditional CNN, which includes six layers: input layer, BLSTM layer, CNN layer, pooling layer, full connection layer and output layer, DeepSite approach can predict DNA–protein binding sites with 87.12% sensitivity, 91.06% specificity, 89.19% accuracy and 0.783 MCC, when tested on the 690 Chip-seq experiments from ENCODE. Lastly, we conclude that our proposed method can also be applied to find DNA–protein binding sites in different DNA sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeepTF: Accurate Prediction of Transcription Factor Binding Sites by Combining Multi-scale Convolution and Long Short-Term Memory Neural Network

A New Method Combining DNA Shape Features to Improve the Prediction Accuracy of Transcription Factor Binding Sites

Using Deep Learning to Predict Transcription Factor Binding Sites Combining Raw DNA Sequence, Evolutionary Information and Epigenomic Data

Notes

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/

Abbreviations

Acc:: Accuracy
AUC:: The area under the ROC curve
BLSTM:: Bidirectional long short-term memory
BP:: Back-propagation algorithm
CNN:: Convolutional neural network
ENCODE:: The Encyclopedia of DNA elements
FN:: The number of false negative
FP:: The number of false positive
GPU:: Graphical processing units
MCC:: Mathews correlation coefficient
PFM:: Positional frequency matrix
Pre:: Precision
PSSM:: Position specific scoring matrix
ROC:: Receiver operating characteristic
Sen:: Sensitivity
Spe:: Specificity
TN:: The number of true negatives
TP:: The number of true positive
TFs:: Transcription factors
TFBS:: Transcription factor binding site

References

Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas R, Morgunova E, Enge M, Taipale M, Wei G (2013) DNA-binding specificities of human transcription factors. Cell 152(1):327–339
Article Google Scholar
Zhou TY, Shen N, Yang L, Abe N, Horton J, Mann RS, Bussemaker HJ, Gordân R, Rohs R (2015) Quantitative modeling of transcription factor binding specificities using DNA shape. Proc Natl Acad Sci 112(15):4654–4659
Article Google Scholar
Slattery M, Zhou T, Yang L, Dantas AC, Gordan R, Rohs R (2014) Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci 39(9):381–399
Article Google Scholar
Zhang YQ, Cao XY, Zhong S (2016) Genemo: a search engine for web-based functional genomic data. Nucleic Acids Res 44(W1):W122–W127
Article Google Scholar
Fan S, Huang K, Ai R, Wang M, Wang W (2016) Predicting CPG methylation levels by integrating infinium humanmethylation 450 beadchip array data. Genomics 107(4):132–137
Article Google Scholar
Furey TS (2012) Chip-seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat Rev Genet 13(12):840–52
Article Google Scholar
Wang L, Chen J, Wang C, Uuskülareimand L, Chen K, Medinarivera A, Young EJ, Zimmermann MT, Yan H, Sun Z (2014) Mace: model based analysis of chip-exo. Nucleic Acids Res 42(20):e156
Article Google Scholar
He QY, Johnston J, Zeitlinger JL (2015) Chip-nexus: a novel chip-exo protocol for improved detection of in vivo transcription factor binding footprints. Nat Biotechnol 33(4):395–401
Article Google Scholar
Cirillo D, Bottaorfila T, Tartaglia GG (2015) By the company they keep: interaction networks define the binding ability of transcription factors. Nucleic Acids Res 43(19):e125
Article Google Scholar
Zhang HB, Lin Z, Huang DS (2016) Discmla: an efficient discriminative motif learning algorithm over high-throughput datasets. IEEE ACM Trans Comput Biol Bioinform 15(6):1810–1820
Article Google Scholar
Zhu L, Guo WL, Lu CY, Huang DS (2017) Collaborative completion of transcription factor binding profiles via local sensitive unified embedding. IEEE Trans Nanobiosci 15(8):946–958
Google Scholar
Schmidt F, Kern F, Ebert P, Baumgarten N, Schulz MH (2018) Tepic 2—an extended framework for transcription factor binding prediction and integrative epigenomic analysis. Bioinformatics 35(9):1608–1619
Article Google Scholar
Huang DS (2004) A constructive approach for finding arbitrary roots of polynomials by neural networks. IEEE Trans Neural Netw 15(2):477–491
Article Google Scholar
Zhang YQ, Zhang DL, Mi G, Ma DC, Li GB, Guo YZ, Li ML, Zhu M (2012) Using ensemble methods to deal with imbalanced data in predicting protein–protein interactions. Comput Biol Chem 36:36–41
Article MathSciNet Google Scholar
Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18(5):851–869
Google Scholar
Zhang YQ, Qiao SJ, Ji SJ, Zhou JL (2018) Ensemble-cnn: Predicting dna binding sites in protein sequences by an ensemble deep learning method. In: Proceedings of 2018 international conference on intelligent computing. Springer, Wuhan, China, pp 301–306
Google Scholar
Spencer M, Eickholt J, Cheng JL (2015) A deep learning network approach to ab initio protein secondary structure prediction. IEEE ACM Trans Comput Biol Bioinform 12(1):103–112
Article Google Scholar
Chen YF, Li Y, Narayan R, Subramanian A, Xie XH (2016) Gene expression inference with deep learning. Bioinformatics 32(12):1–8
Article Google Scholar
Zhang Y, Qiao S, Ji S, Han N, Liu D, Zhou J (2019) Identification of DNA–protein binding sites by bootstrap multiple convolutional neural networks on sequence information. Eng Appl Artif Intell 79:58–66
Article Google Scholar
Asgari E, Mofrad MRK (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One 10(11):1–15
Article Google Scholar
Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–839
Article Google Scholar
Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12(10):931–934
Article Google Scholar
Zeng H, Edwards MD, Liu G, Gifford DK (2016) Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32(12):i121–i127
Article Google Scholar
Cao Z, Zhang SH (2018) Simple tricks of convolutional neural network architectures improve DNA–protein binding prediction. Bioinformatics 35(11):1837–1843
Article Google Scholar
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S (2012) Gencode: the reference human genome annotation for the encode project. Genome Res 22(9):1760–1774
Article Google Scholar
Wang X, Wang R, Chen X (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715
Article Google Scholar
Wang R, Wang X, Kwong S, Chen X (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475
Article Google Scholar
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing. IEEE, Vancouver, BC, Canada, pp 6645–6649
Google Scholar
Zhu L, Deng SP, Huang S (2015) A two-stage geometric method for pruning unreliable links in protein–protein networks. IEEE Trans Nanobiosci 14(5):528–534
Article Google Scholar
Klaus G, Rupesh KS, Jan K, Bas RS, Jürgen S (2015) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232
MathSciNet Google Scholar
Krizhevsky A, Sutskever T, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems. Lake Tahoe, Nevada, USA, pp 1097–1105
Abdel-Hamid O, Mohamed AR, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE international conference on acoustics, speech and signal processing. IEEE, Kyoto, Japan, pp 4277–4280
Chapter Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: 2014 IEEE conference on computer vision and pattern recognition. IEEE, Columbus, OH, USA, pp 1725–1732
Chapter Google Scholar
Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition. IEEE, Tsukuba, Japan, pp 3304–3308
Google Scholar
Cecotti H, Graser A (2011) Convolutional neural networks for p300 detection with application to brain–computer interfaces. IEEE Trans Pattern Anal Mach Intell 33(3):433–445
Article Google Scholar
Ouyang WL, Wang XG, Zeng XY, Qiu S, Luo P, Tian YL, Li HS, Yang S, Wang Z, Loy CC (2015) Deepid-net: deformable deep convolutional neural networks for object detection. In: IEEE conference on computer vision and pattern recognition. IEEE, Boston, MA, USA, pp 2403–2412
Google Scholar
Wang X, Xing H, Li Y, Hua Q, Dong C, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
Article Google Scholar
Kingma D, Ba J (2014) ADAM: a method for stochastic optimization. In: Proceedings of 3rd international conference on learning representations. San Diego, CA, USA, pp 1–15
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article Google Scholar
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):257–269
MathSciNet MATH Google Scholar
Wang X, Zhang T, Wang R (2019) Non-iterative deep learning: incorporating restricted Boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst 49(7):1299–1380
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants nos. 61702058, 61772091, 61802035, 71701026; the China Postdoctoral Science Foundation funded project under Grant no. 2017M612948; the Scientific Research Foundation for Education Department of Sichuan Province under Grant no. 18ZA0098; the Sichuan Science and Technology Program under Grant nos. 2018JY0448, 2019YFG0106, 2019YFS0067, 2018GZ0307; the Natural Science Foundation of Guangxi under Grant no. 2018GXNSFDA138005; the Innovative Research Team Construction Plan in Universities of Sichuan Province under Grant no. 18TD0027; the Fund of Science and Technology Department of Guizhou Province under Grant no. J[2014]2134; the Scientific Research Foundation for Young Academic Leaders of Chengdu University of Information Technology under Grant nos. J201706, J201701; the Scientific Research Foundation for Advanced Talents of Chengdu University of Information Technology under Grant nos. KYTZ201717, KYTZ201715, KYTZ201750; Guangdong Key Laboratory Project under Grant no. 2017B030314073.

Author information

Authors and Affiliations

School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
Yongqing Zhang & Shengjie Ji
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China
Yongqing Zhang
School of Software Engineering, Chengdu University of Information Technology, Chengdu, 610225, China
Shaojie Qiao
College of Chemistry, Sichuan University, Chengdu, 610064, China
Yizhou Li

Authors

Yongqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shaojie Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Shengjie Ji
View author publications
You can also search for this author in PubMed Google Scholar
Yizhou Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaojie Qiao.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Qiao, S., Ji, S. et al. DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding. Int. J. Mach. Learn. & Cyber. 11, 841–851 (2020). https://doi.org/10.1007/s13042-019-00990-x

Download citation

Received: 28 June 2018
Accepted: 22 July 2019
Published: 29 July 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s13042-019-00990-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding

Abstract

Access this article

Similar content being viewed by others

DeepTF: Accurate Prediction of Transcription Factor Binding Sites by Combining Multi-scale Convolution and Long Short-Term Memory Neural Network

A New Method Combining DNA Shape Features to Improve the Prediction Accuracy of Transcription Factor Binding Sites

Using Deep Learning to Predict Transcription Factor Binding Sites Combining Raw DNA Sequence, Evolutionary Information and Epigenomic Data

Notes

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding

Abstract

Access this article

Similar content being viewed by others

DeepTF: Accurate Prediction of Transcription Factor Binding Sites by Combining Multi-scale Convolution and Long Short-Term Memory Neural Network

A New Method Combining DNA Shape Features to Improve the Prediction Accuracy of Transcription Factor Binding Sites

Using Deep Learning to Predict Transcription Factor Binding Sites Combining Raw DNA Sequence, Evolutionary Information and Epigenomic Data

Notes

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation