A Transformer-Based Deep Learning Approach with Multi-layer Feature Processing for Accurate Prediction of Protein-DNA Binding Residues

Zhao, Haipeng; Zhu, Baozhong; Jiang, Tengsheng; Cui, Zhiming; Wu, Hongjie

doi:10.1007/978-981-99-4749-2_47

Haipeng Zhao¹³,
Baozhong Zhu¹³,
Tengsheng Jiang¹⁴,
Zhiming Cui¹³ &
…
Hongjie Wu¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14088))

Included in the following conference series:

International Conference on Intelligent Computing

806 Accesses

Abstract

Proteins have significant biological effects when they bind to other substances, with binding to DNA being particularly crucial. Therefore, accurate identification of protein-DNA binding residues is important for further understanding of the protein-DNA interaction mechanism. Most current state-of-the-art methods are two-step approaches: the first step uses a sliding window technique to extract residue features; the second step uses each residue as an input to the model for prediction. This has a negative impact on the efficiency of prediction and ease of use. In this study, we propose a sequence-to-sequence (seq2seq) model that can input the entire protein sequence of variable length and use multiple modules including Transformer Encoder Module, Feature Fusion Module, and Feature Extraction Module for multi-layer feature processing. The Transformer Encoder Module is used to extract global features while the Feature Extraction Module is used to extract local features, further improving the recognition capability of the model. Comparison results on two benchmark datasets PDNA-543 and PDNA-41 demonstrate the effectiveness of our method in identifying protein-DNA binding residues. The code is available at https://github.com/HaipengZZhao/Prediction-of-Residues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dobson, C.M.: Chemical space and biology. Nature 432(7019), 824–828 (2004)
Article Google Scholar
Gao, M., Skolnick, J.: The distribution of ligand-binding pockets around protein-protein interfaces suggests a general mechanism for pocket formation. Proc. Natl. Acad. Sci. 109(10), 3784–3789 (2012)
Article Google Scholar
Zhao, J., Cao, Y., Zhang, L.: Exploring the computational methods for protein-ligand binding site prediction. Comput. Struct. Biotechnol. J. 18, 417–426 (2020)
Article Google Scholar
Ofran, Y., Mysore, V., Rost, B.: Prediction of DNA-binding residues from sequence. Bioinformatics 23(13), i347–i353 (2007)
Article Google Scholar
Jones, S., Van Heyningen, P., Berman, H.M., et al.: Protein-DNA interactions: a structural analysis. J. Mol. Biol. 287(5), 877–896 (1999)
Article Google Scholar
Smyth, M.S., Martin, J.H.J.: X Ray crystallography. Mol. Pathol. 53(1), 8 (2000)
Article Google Scholar
Nelson, J.D., Denisenko, O., Bomsztyk, K.: Protocol for the fast chromatin immunoprecipitation (ChIP) method. Nat. Protoc. 1(1), 179–185 (2006)
Article Google Scholar
Heffler, M.A., Walters, R.D., Kugel, J.F.: Using electrophoretic mobility shift assays to measure equilibrium dissociation constants: GAL4-p53 binding DNA as a model system. Biochem. Mol. Biol. Educ. 40(6), 383–387 (2012)
Article Google Scholar
Hellman, L.M., Fried, M.G.: Electrophoretic mobility shift assay (EMSA) for detecting protein–nucleic acid interactions. Nat. Protoc. 2(8), 1849–1861 (2007)
Article Google Scholar
Vajda, S., Guarnieri, F.: Characterization of protein-ligand interaction sites using experimental and computational methods. Curr. Opin. Drug Discov. Devel. 9(3), 354 (2006)
Google Scholar
Ding, Y., Yang, C., Tang, J., et al.: Identification of protein-nucleotide binding residues via graph regularized k-local hyperplane distance nearest neighbor model. Appl. Intell. 1–15 (2022)
Google Scholar
Wang, L., Brown, S.J.: BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 34(suppl_2), W243-W248 (2006)
Google Scholar
Chu, W.Y., Huang, Y.F., Huang, C.C., et al.: ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res. 37(suppl_2), W396-W401 (2009)
Google Scholar
Hwang, S., Gou, Z., Kuznetsov, I.B.: DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5), 634–636 (2007)
Article Google Scholar
Wang, L., Huang, C., Yang, M.Q., et al.: BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst. Biol. 4, 1–9 (2010)
Article Google Scholar
Si, J., Zhang, Z., Lin, B., et al.: MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst. Biol. 5(1), 1–7 (2011)
Google Scholar
Hu, J., Li, Y., Zhang, M., et al.: Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1389–1398 (2016)
Article Google Scholar
Liu, R., Hu, J.: DNABind: a hybrid algorithm for structure‐based prediction of DNA‐binding residues by combining machine learning‐and template‐based approaches. PROTEINS: Structure, Function Bioinform. 81(11), 1885–1899 (2013)
Google Scholar
Zhu, Y.H., Hu, J., Song, X.N., et al.: DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J. Chem. Inf. Model. 59(6), 3057–3071 (2019)
Article Google Scholar
Hu, J., Bai, Y.S., Zheng, L.L., et al.: Protein-DNA binding residue prediction via bagging strategy and sequence-based cube-format feature. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(6), 3635–3645 (2021)
Google Scholar
Altschul, S.F., Madden, T.L., Schäffer, A.A., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Article Google Scholar
Gao, M., Skolnick, J.: DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions. Nucleic Acids Res. 36(12), 3978–3992 (2008)
Article Google Scholar
Ozbek, P., Soner, S., Erman, B., et al.: DNABINDPROT: fluctuation-based predictor of DNA-binding residues within a network of interacting residues. Nucleic Acids Res. 38(suppl_2), W417-W423 (2010)
Google Scholar
Chen, Y.C., Wright, J.D., Lim, C.: DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res. 40(W1), W249–W256 (2012)
Article Google Scholar
Tsuchiya, Y., Kinoshita, K., Nakamura, H.: PreDs: a server for predicting dsDNA-binding site on protein molecular surfaces. Bioinformatics 21(8), 1721–1723 (2005)
Article Google Scholar
Yu, D.J., Hu, J., Tang, Z.M., et al.: Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 104, 180–190 (2013)
Article Google Scholar
Yang, J., Roy, A., Zhang, Y.: Protein–ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29(20), 2588–2595 (2013)
Article Google Scholar
Yu, D.J., Hu, J., Yang, J., et al.: Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(4), 994–1008 (2013)
Article Google Scholar
Chen, K., Mizianty, M.J., Kurgan, L.: ATPsite: sequence-based prediction of ATP-binding residues proteome science. BioMed Central 9(1), 1–8 (2011)
Google Scholar
Chen, K., Mizianty, M.J., Kurgan, L.: Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 28(3), 331–341 (2012)
Article Google Scholar
Zhang, Q., Wang, S., Chen, Z., et al.: Locating transcription factor binding sites by fully convolutional neural network. Brief. Bioinform. 22(5), bbaa435 (2021)
Google Scholar
Cui, Z., Chen, Z.H., Zhang, Q.H., et al.: Rmscnn: a random multi-scale convolutional neural network for marine microbial bacteriocins identification. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(6), 3663–3672 (2021)
Google Scholar
Su, X., You, Z.H., Huang, D., et al.: Biomedical knowledge graph embedding with capsule network for multi-label drug-drug interaction prediction. IEEE Trans. Knowl. Data Eng. (2022)
Google Scholar
Cui, Y., Dong, Q., Hong, D., et al.: Predicting protein-ligand binding residues with deep convolutional neural networks. BMC Bioinform. 20(1), 1–12 (2019)
Article Google Scholar
Li, W., Godzik, A.: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13), 1658–1659 (2006)
Article Google Scholar
Wang, Y., Ding, Y., Guo, F., et al.: Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS ONE 12(9), e0185587 (2017)
Article Google Scholar
Ding, Y., Tang, J., Guo, F.: Identification of protein–ligand binding sites by sequence information and ensemble classifier. J. Chem. Inf. Model. 57(12), 3149–3161 (2017)
Article Google Scholar
Ahmad, S., Sarai, A.: PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics 6, 1–6 (2005)
Article Google Scholar
UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47(D1), D506-D515 (2019)
Google Scholar

Download references

Acknowledgement

This paper is supported by the National Natural Science Foundation of China (62073231, 62176175, 61902271), National Research Project (2020YFC2006602), Provincial Key Laboratory for Computer Information Processing Technology, Soochow University (KJS2166), Opening Topic Fund of Big Data Intelligent Engineering Laboratory of Jiangsu Province (SDGC2157).

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
Haipeng Zhao, Baozhong Zhu, Zhiming Cui & Hongjie Wu
Gusu School, Nanjing Medical University, Suzhou, Jiangsu, China
Tengsheng Jiang

Authors

Haipeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Baozhong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Tengsheng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiming Cui
View author publications
You can also search for this author in PubMed Google Scholar
Hongjie Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjie Wu .

Editor information

Editors and Affiliations

Department of Computer Science, Eastern Institute of Technology, Zhejiang, China
De-Shuang Huang
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Zhengzhou University of Light Industry, Zhengzhou, China
Baohua Jin
Zhong Yuan University of Technology, Zhengzhou, China
Boyang Qu
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Department of Computer Science, Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, H., Zhu, B., Jiang, T., Cui, Z., Wu, H. (2023). A Transformer-Based Deep Learning Approach with Multi-layer Feature Processing for Accurate Prediction of Protein-DNA Binding Residues. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14088. Springer, Singapore. https://doi.org/10.1007/978-981-99-4749-2_47

Download citation

DOI: https://doi.org/10.1007/978-981-99-4749-2_47
Published: 30 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4748-5
Online ISBN: 978-981-99-4749-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics