A Hybrid Deep Neural Network for the Prediction of In-Vivo Protein-DNA Binding by Combining Multiple-Instance Learning

Zhang, Yue; Chen, Yuehui; Bao, Wenzheng; Cao, Yi

doi:10.1007/978-3-030-84532-2_34

Yue Zhang¹³,
Yuehui Chen¹⁴,
Wenzheng Bao¹⁵ &
…
Yi Cao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12838))

Included in the following conference series:

International Conference on Intelligent Computing

1430 Accesses
2 Citations

Abstract

Not only is modeling in-vivo protein-DNA binding basic to a deeper comprehension of regulatory mechanisms, but a complicated job in computational biology. Although current deep-learning based methods have achieved some success in-vivo protein-DNA binding, on the one hand, they tend to ignore the weakly supervised information genome sequences, that is, the bound DNA sequence has a high probability of containing more than one TFBS. On the other hand, One-hot encoding requires each category to be independent of each other, and the dependence between nucleotides is ignored when it is used to encode DNA sequences. In order to solve this problem, we developed a framework based on weakly-supervised. The structure proposed in this paper combines multi-instance learning with hybrid deep neural networks and uses K-mer encoding instead of one-hot encoding to process DNA sequences, this operation simulates in-vivo protein-DNA binding. First of all, we use the concepts of MIL to segments the input sequence into many overlapping instances, and then use K-mer encoding to convert these instances into high-order dependent inputs of the image-like. Then hybrid deep neural network that integrates convolutional and recurrent neural networks is used to calculate the score of all the instances contained in the same bag. Finally, it uses the “Noisy-and” method to integrate the predicted values for all instances into the final predicted values for the bag. This paper discusses the effect of K-mer encoding on the function of the framework and verifies the function of “Noisy-and” compared with other fusion methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Elnitski, L., Jin, V.X., Farnham, P.J., Jones, S.J.M.: Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res. 16, 1455–1464 (2006)
Article Google Scholar
Orenstein, Y., Shamir, R.: A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data. Nucleic Acids Res. 42, e63–e63 (2014)
Article Google Scholar
Furey, T.S.: ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat. Rev. Genet. 13, 840–852 (2012)
Article Google Scholar
Jothi, R., Cuddapah, S., Barski, A., Cui, K., Zhao, K.: Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 36, 5221–5231 (2008)
Article Google Scholar
Stormo, G.D.: Consensus patterns in DNA. Methods Enzymol. 183, 211–221 (1990)
Article Google Scholar
Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)
Article Google Scholar
Zhao, X., Huang, H., Speed, T.P.: Finding short DNA motifs using permuted Markov models. J. Comput. Biol. 12, 894–906 (2005)
Article Google Scholar
Badis, G., et al.: Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009)
Article Google Scholar
Ghandi, M., et al.: gkmSVM: an R package for gapped-kmer SVM. Bioinformatics 32, 2205–2207 (2016)
Article Google Scholar
Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015)
Article Google Scholar
Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015)
Article Google Scholar
Quang, D., Xie, X.: DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107–e107 (2016)
Article Google Scholar
Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32, i121–i127 (2016)
Article Google Scholar
Kelley, D.R., Snoek, J., Rinn, J.L.: Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016)
Article Google Scholar
Hassanzadeh, H.R., Wang, M.D.: DeeperBind: enhancing prediction of sequence specificities of DNA binding proteins. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 178–183 (2017)
Google Scholar
Shrikumar, A., Greenside, P., Kundaje, A.: Reverse-complement parameter sharing improves deep learning models for genomics. bioRxiv, 103663 (2017)
Google Scholar
Lo Bosco, G., Di Gangi, M.: Deep learning architectures for DNA sequence classification. In: Petrosino, A., Loia, V., Pedrycz, W. (eds.) WILF 2016. LNCS (LNAI), vol. 10147, pp. 162–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52962-2_14
Chapter Google Scholar
Gao, Z., Ruan, J.: Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning. Bioinformatics 33(14), 2097–2105 (2017)
Article Google Scholar
Annala, M., Laurila, K., Lähdesmäki, H., Nykter, M.: A linear model for transcription factor binding affinity prediction in protein binding microarrays. PloS One 6, e20059 (2011)
Article Google Scholar
Zhang, Q., Zhu, L., Bao, W., Huang, D.S.: Weakly supervised convolutional neural network architecture for predicting protein-DNA binding. IEEE/ACM Trans. Comput. Biol. Bioinform. 17, 679–689 (2018)
Google Scholar
Keilwagen, J., Grau, J.: Varying levels of complexity in transcription factor binding motifs. Nucleic Acids Res. 43, e119 (2015)
Article Google Scholar
Siebert, M., Söding, J.: Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences. Nucleic Acids Res. 44, 6055–6069 (2016)
Article Google Scholar
Eggeling, R., Roos, T., Myllymäki, P., Grosse, I.: Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data. BMC Bioinformatics 16, 1–15 (2015)
Article Google Scholar
Zhou, T., et al.: Quantitative modeling of transcription factor binding specificities using DNA shape. Proc. Natl. Acad. Sci. 112(15), 4654–4659 (2015)
Article Google Scholar
Zhang, Q., Zhu, L., Huang, D.S.: High-order convolutional neural network architecture for predicting DNA-protein binding sites. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 1–1 (2018)
Google Scholar
Kraus, O.Z., Ba, J.L., Frey, B.J.: Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32, i52–i59 (2016)
Article Google Scholar
Huang, D.S.: Systematic Theory of Neural Networks for Pattern Recognition, vol. 201. Publishing House of Electronic Industry of China, Beijing (1996)
Google Scholar
Huang, D.S.: Radial basis probabilistic neural networks: model and application. Int. J. Pattern Recogn. Artif. Intell. 13, 1083–1101 (1999)
Article Google Scholar
Huang, D.S., Du, J.X.: A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks. IEEE Trans. Neural Netw. 19, 2099–2115 (2008)
Article Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Durand, T., Thome, N., Cord, M.: WELDON: weakly supervised learning of deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4743–4752 (2016)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)
Article Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: ICML 2006: Proceedings of the International Conference on Machine Learning, New York, NY, USA, pp. 233–240 (2006)
Google Scholar
Sasaki, Y.: The truth of the F-measure. Teach. Tutor. Mater. 1(5), 1–5 (2007)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 9, 249–256 (2010)
Google Scholar
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. Computer Science (2012)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the University Innovation Team Project of Jinan (2019GXRC015), and in part by Key Science &Technology Innovation Project of Shandong Province (2019JZZY010324), the Natural Science Foundation of China (No. 61902337), Natural Science Fund for Colleges and Universities in Jiangsu Prov-ince (No. 19KJB520016), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Young talents of science and technology in Jiangsu.

Author information

Authors and Affiliations

School of Information Science and Engineering, University of Jinan, Jinan, China
Yue Zhang
School of Artificial Intelligence Institute and Information Science and Engineering, University of Jinan, Jinan, China
Yuehui Chen
School of Information Engineering (School of Big Data), Xuzhou University of Technology, Xuzhou, China
Wenzheng Bao
Shandong Provincial Key Laboratory of Network Based Intelligent Computing (School of Information Science and Engineering), University of Jinan, Jinan, China
Yi Cao

Authors

Yue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuehui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenzheng Bao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenzheng Bao .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Shenzhen University, Shenzhen, China
Jianqiang Li
Far Eastern Branch of the Russian Academy of Sciences, Vladivostok, Russia
Valeriya Gribova
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Chen, Y., Bao, W., Cao, Y. (2021). A Hybrid Deep Neural Network for the Prediction of In-Vivo Protein-DNA Binding by Combining Multiple-Instance Learning. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-84532-2_34
Published: 09 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84531-5
Online ISBN: 978-3-030-84532-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics