Using Deep Learning to Predict Transcription Factor Binding Sites Combining Raw DNA Sequence, Evolutionary Information and Epigenomic Data

Xu, Youhong; Zhang, Qinghu; Chen, Zhanheng; Yuan, Changan; Qin, Xiao; Wu, Hongjie

doi:10.1007/978-3-030-84532-2_35

Youhong Xu^13,14,15,
Qinghu Zhang^13,14,15,
Zhanheng Chen^13,14,15,
Changan Yuan¹³,
Xiao Qin¹⁴ &
…
Hongjie Wu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12838))

Included in the following conference series:

International Conference on Intelligent Computing

1452 Accesses

Abstract

DNA-binding proteins (DBPs) have an important role in various regulatory tasks. In recent years, with developing of deep learning, many fields like natural language processing, computer vision and so on have achieve great success. Some great model, for example DeepBind, brought deep learning to motif discovery and also achieve great success in predicting DNA-transcription factor binding, aka motif discovery. But these methods required integrating multiple features with raw DNA sequences such as secondary structure and their performances could be further improved. In this paper, we propose an efficient and simple neural network-based architecture, DBPCNN, integrating conservation scores and epigenomic data to raw DNA sequences for predicting in-vitro DNA protein binding sequence. We show that conservation scores and epigenomic data for raw DNA sequences can significantly improve the overall performance of the proposed model. Moreover, the automatic extraction of the DBA-binding proteins can enhance our understanding of the binding specificities of DBPs. We verify the effectiveness of our model on 20 motif datasets from in-vitro protein binding microarray data. More specifically, the average area under the receiver operator curve (AUC) was improved by 0.58% for conservation scores, 1.29% for MeDIP-seq, 1.20% for histone modifications respectively, and 2.19% for conservation scores, MeDIP-seq and histone modifications together. And the mean average precision (AP) was increased by 0.62% for conservation scores, 1.46% for MeDIP-seq, 1.27% for histone modifications respectively, and 2.29% for conservation scores, MeDIP-seq and histone modifications together.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lambert, S.A., et al.: The human transcription factors. Cell 172, 650–665 (2018)
Article Google Scholar
Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., Luscombe, N.M.: A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252 (2009)
Article Google Scholar
Stormo, G.D.J.B.: DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)
Article Google Scholar
Lee, T.I., Young, R.A.: Transcriptional regulation and its misregulation in disease. Cell 152, 1237–1251 (2013)
Article Google Scholar
Zhu, L., Zhang, H.-B., Huang, D.-S.: Direct AUC optimization of regulatory motifs. Bioinformatics 33, i243–i251 (2017)
Article Google Scholar
Tianyin, Z., Ning, et al.: Quantitative modeling of transcription factor binding specificities using DNA shape. Proc. Natl. Acad. Sci. 112–115 (2015)
Google Scholar
Berger, M.F., Philippakis, A.A., Qureshi, A.M., et al.: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24(11), 1429–1435 (2006)
Article Google Scholar
Stormo, G.D., Zhao, Y.: Determining the specificity of protein-DNA interactions. NAT Rev. Genet. 11(11), 751–760 (2010)
Article Google Scholar
Gordân, R., et al.: Genomic regions flanking e-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 3, 1093–1104 (2013)
Article Google Scholar
Fletezbrant, C., Lee, D., Mccallion, A.S., Beer, M.: kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res. 41, 544–556 (2013)
Article Google Scholar
Shen, Z., Bao, W., Huang, D.: Recurrent neural network for predicting transcription factor binding sites. Sci. Rep. 8, 15270 (2018)
Article Google Scholar
Zhang, Q., Zhu, L., Bao, W., Huang, D.S.: Weakly-supervised convolutional neural network architecture for predicting protein-DNA binding. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(2), 679–689 (2020)
Google Scholar
Zhang, Q., Zhu, L., Huang, D.S.: High-order convolutional neural network architecture for predicting DNA-protein binding sites. IEEE/ACM Trans. Comput. Biol. Bioinform. 16(4), 1184–1192 (2019)
Article Google Scholar
Zhang, Q., Shen, Z., Huang, D.-S.: Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network. Sci. Rep. 9, 8484 (2019)
Article Google Scholar
Xu, W., Zhu, L., Huang, D.S.: DCDE: an efficient deep convolutional divergence encoding method for human promoter recognition. IEEE Trans. NanoBioscience 18(2), 136–145 (2019)
Article Google Scholar
Zhang, H., Zhu, L., Huang, D.S.: DiscMLA: an efficient discriminative motif learning algorithm over high-throughput datasets. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(6), 1810–1820 (2018)
Article Google Scholar
Zhang, H., Zhu, L., Huang, D.S.: WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data. Sci. Rep. 7 (2017). https://doi.org/10.1038/s41598-017-03554-7
Yu, W., Yuan, C.-A., Qin, X., Huang, Z.-K., Shang, L.: Hierarchical attention network for predicting DNA-protein binding sites. In: Huang, D.-S., Jo, K.-H., Huang, Z.-K. (eds.) ICIC 2019. LNCS, vol. 11644, pp. 366–373. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26969-2_35
Chapter Google Scholar
Weirauch, M.T., et al.: Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126–134 (2013)
Article Google Scholar
Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015)
Article Google Scholar
Zhu, L., Bao, W.Z., Huang, D.S.: Learning TF binding motifs by optimizing fisher exact test score. IEEE/ACM Trans. Comput. Biol. Bioinform. (2017)
Google Scholar
Zhu, L., Zhang, H.-B., Huang, D.S.: LMMO: a large margin approach for optimizing regulatory motifs. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(3), 913–925 (2018)
Article Google Scholar
Zhu, L., Zhang, H.-B., Huang, D.-S.: Direct AUC optimization of regulatory motifs. Bioinformatics 33(14), i243–i251 (2017). https://doi.org/10.1093/bioinformatics/btx255
Article Google Scholar
Zhu, L., Guo, W., Deng, S.-P., Huang, D.S.: ChIP-PIT: Enhancing the analysis of ChIP-Seq data using convex-relaxed pair-wise interaction tensor decomposition. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(1), 55–63 (2016)
Article Google Scholar
Guo, W.L., Huang, D.S.: An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. Mol. Biosyst. 13, 1827–1837 (2017)
Article Google Scholar
Boffelli, D., et al.: Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299(5611), 1391–1394 (2003)
Article Google Scholar
Bpffelli, D., Nobrega, M.A., Rubin, E.M.: Comparative genomics at the vertebrate extremes. Nat. Rev. Genet. 5(6), 456–465 (2004)
Article Google Scholar
McGuire, A.M., Hughes, J.D., Church, G.M.: Conservation of dna regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 10(6), 744–757 (2000)
Article Google Scholar
Li, H., Rhodius, V., Gross, C., Siggia, E.D.: Identification of the binding sites of regulatory proteins in bacterial genomes. Proc. Natl. Acad. Sci. 99(18), 11772–11777 (2002)
Article Google Scholar
Woolfe, A., et al.: Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3(1), e7 (2004)
Article Google Scholar
Tayara, H., Chong, K.: Improved predicting of the sequence specificities of RNA binding proteins by deep learning. IEEE/ACM Trans. Comput. Biol. Bioinform. (2020)
Google Scholar
Jing, F., Zhang, S.-W., Cao, Z., Zhang, S.: Combining sequence and epigenomic data to predict transcription factor binding sites using deep learning. In: Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds.) ISBRA 2018. LNCS, vol. 10847, pp. 241–252. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94968-0_23
Chapter Google Scholar
Stewart, A.J., Hannenhalli, S., Plotkin, J.B.: Why transcription factor binding sites are ten nucleotides long. Genetics 192(3), 973–985 (2012)
Article Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv abs/1212.5701 (2012)
Google Scholar
Rohs, R., West, S.M., Sosinsky, A., Liu, P., Mann, R.S., Honig, B.: The role of DNA shape in protein–DNA recognition. Nature 461, 1248–1253 (2009)
Article Google Scholar
Zhou, T., et al.: Quantitative modeling of transcription factor binding specificities using DNA shape. Proc. Natl. Acad. Sci. U.S.A. 112, 4654–4659 (2015)
Article Google Scholar
Zhang, Q., Shen, Z., Huang, D.: Predicting in-vitro transcription factor binding sites using DNA sequence + shape. IEEE/ACM Trans. Comput. Biol. Bioinform. 1 (2019)
Google Scholar
Tsatsaronis, G., Panagiotopoulou, V.: A generalized vector space model for text retrieval based on semantic relatedness. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 70–78 (2009)
Google Scholar
Wang, J., Huang, P., Zhao, H., Zhang, Z., Zhao, B., Lee, D.L.: Billion-scale commodity embedding for E-commerce recommendation in Alibaba. In: Knowledge Discovery and Data Mining, pp. 839–848 (2018)
Google Scholar
Wang, D., Zhang, Q., Yuan, C.-A., Qin, X., Huang, Z.-K., Shang, L.: Motif discovery via convolutional networks with K-mer embedding. In: Huang, D.-S., Jo, K.-H., Huang, Z.-K. (eds.) ICIC 2019. LNCS, vol. 11644, pp. 374–382. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26969-2_36
Chapter Google Scholar
Zhu, L., Guo, W.-L., Huang, D.S., Lu, C.-Y.: Imputation of ChIP-seq datasets via low rank convex co-embedding. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 141–144 (2015)
Google Scholar
Chen, Z.-H., et al.: Prediction of drug-target interactions from multi-molecular network based on deep walk embedding model. Front. Bioeng. Biotechnol. 8, 338 (2020)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the grant of National Key R&D Program of China (No. 2018AAA0100100 & 2018YFA0902600) and partly supported by National Natural Science Foundation of China (Grant nos. 61861146002, 61732012, 61772370, 62002266, 61932008, 61772357, and 62073231) and supported by “BAGUI Scholar” Program and the Scientific & Technological Base and Talent Special Program, GuiKe AD18126015 of the Guangxi Zhuang Autonomous Region of China.

Author information

Authors and Affiliations

Guangxi Academy of Science, Nanning, 530007, China
Youhong Xu, Qinghu Zhang, Zhanheng Chen & Changan Yuan
School of Computer and Information Engineering, Nanning Normal University, Nanning, 530299, China
Youhong Xu, Qinghu Zhang, Zhanheng Chen & Xiao Qin
School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
Youhong Xu, Qinghu Zhang, Zhanheng Chen & Hongjie Wu

Authors

Youhong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qinghu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhanheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Changan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Hongjie Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Shenzhen University, Shenzhen, China
Jianqiang Li
Far Eastern Branch of the Russian Academy of Sciences, Vladivostok, Russia
Valeriya Gribova
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Y., Zhang, Q., Chen, Z., Yuan, C., Qin, X., Wu, H. (2021). Using Deep Learning to Predict Transcription Factor Binding Sites Combining Raw DNA Sequence, Evolutionary Information and Epigenomic Data. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-84532-2_35
Published: 09 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84531-5
Online ISBN: 978-3-030-84532-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics