Skip to main content

A Transformer-Based Deep Learning Approach with Multi-layer Feature Processing for Accurate Prediction of Protein-DNA Binding Residues

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14088))

Included in the following conference series:

  • 806 Accesses

Abstract

Proteins have significant biological effects when they bind to other substances, with binding to DNA being particularly crucial. Therefore, accurate identification of protein-DNA binding residues is important for further understanding of the protein-DNA interaction mechanism. Most current state-of-the-art methods are two-step approaches: the first step uses a sliding window technique to extract residue features; the second step uses each residue as an input to the model for prediction. This has a negative impact on the efficiency of prediction and ease of use. In this study, we propose a sequence-to-sequence (seq2seq) model that can input the entire protein sequence of variable length and use multiple modules including Transformer Encoder Module, Feature Fusion Module, and Feature Extraction Module for multi-layer feature processing. The Transformer Encoder Module is used to extract global features while the Feature Extraction Module is used to extract local features, further improving the recognition capability of the model. Comparison results on two benchmark datasets PDNA-543 and PDNA-41 demonstrate the effectiveness of our method in identifying protein-DNA binding residues. The code is available at https://github.com/HaipengZZhao/Prediction-of-Residues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dobson, C.M.: Chemical space and biology. Nature 432(7019), 824–828 (2004)

    Article  Google Scholar 

  2. Gao, M., Skolnick, J.: The distribution of ligand-binding pockets around protein-protein interfaces suggests a general mechanism for pocket formation. Proc. Natl. Acad. Sci. 109(10), 3784–3789 (2012)

    Article  Google Scholar 

  3. Zhao, J., Cao, Y., Zhang, L.: Exploring the computational methods for protein-ligand binding site prediction. Comput. Struct. Biotechnol. J. 18, 417–426 (2020)

    Article  Google Scholar 

  4. Ofran, Y., Mysore, V., Rost, B.: Prediction of DNA-binding residues from sequence. Bioinformatics 23(13), i347–i353 (2007)

    Article  Google Scholar 

  5. Jones, S., Van Heyningen, P., Berman, H.M., et al.: Protein-DNA interactions: a structural analysis. J. Mol. Biol. 287(5), 877–896 (1999)

    Article  Google Scholar 

  6. Smyth, M.S., Martin, J.H.J.: X Ray crystallography. Mol. Pathol. 53(1), 8 (2000)

    Article  Google Scholar 

  7. Nelson, J.D., Denisenko, O., Bomsztyk, K.: Protocol for the fast chromatin immunoprecipitation (ChIP) method. Nat. Protoc. 1(1), 179–185 (2006)

    Article  Google Scholar 

  8. Heffler, M.A., Walters, R.D., Kugel, J.F.: Using electrophoretic mobility shift assays to measure equilibrium dissociation constants: GAL4-p53 binding DNA as a model system. Biochem. Mol. Biol. Educ. 40(6), 383–387 (2012)

    Article  Google Scholar 

  9. Hellman, L.M., Fried, M.G.: Electrophoretic mobility shift assay (EMSA) for detecting protein–nucleic acid interactions. Nat. Protoc. 2(8), 1849–1861 (2007)

    Article  Google Scholar 

  10. Vajda, S., Guarnieri, F.: Characterization of protein-ligand interaction sites using experimental and computational methods. Curr. Opin. Drug Discov. Devel. 9(3), 354 (2006)

    Google Scholar 

  11. Ding, Y., Yang, C., Tang, J., et al.: Identification of protein-nucleotide binding residues via graph regularized k-local hyperplane distance nearest neighbor model. Appl. Intell. 1–15 (2022)

    Google Scholar 

  12. Wang, L., Brown, S.J.: BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 34(suppl_2), W243-W248 (2006)

    Google Scholar 

  13. Chu, W.Y., Huang, Y.F., Huang, C.C., et al.: ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res. 37(suppl_2), W396-W401 (2009)

    Google Scholar 

  14. Hwang, S., Gou, Z., Kuznetsov, I.B.: DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5), 634–636 (2007)

    Article  Google Scholar 

  15. Wang, L., Huang, C., Yang, M.Q., et al.: BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst. Biol. 4, 1–9 (2010)

    Article  Google Scholar 

  16. Si, J., Zhang, Z., Lin, B., et al.: MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst. Biol. 5(1), 1–7 (2011)

    Google Scholar 

  17. Hu, J., Li, Y., Zhang, M., et al.: Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1389–1398 (2016)

    Article  Google Scholar 

  18. Liu, R., Hu, J.: DNABind: a hybrid algorithm for structure‐based prediction of DNA‐binding residues by combining machine learning‐and template‐based approaches. PROTEINS: Structure, Function Bioinform. 81(11), 1885–1899 (2013)

    Google Scholar 

  19. Zhu, Y.H., Hu, J., Song, X.N., et al.: DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J. Chem. Inf. Model. 59(6), 3057–3071 (2019)

    Article  Google Scholar 

  20. Hu, J., Bai, Y.S., Zheng, L.L., et al.: Protein-DNA binding residue prediction via bagging strategy and sequence-based cube-format feature. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(6), 3635–3645 (2021)

    Google Scholar 

  21. Altschul, S.F., Madden, T.L., Schäffer, A.A., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  22. Gao, M., Skolnick, J.: DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions. Nucleic Acids Res. 36(12), 3978–3992 (2008)

    Article  Google Scholar 

  23. Ozbek, P., Soner, S., Erman, B., et al.: DNABINDPROT: fluctuation-based predictor of DNA-binding residues within a network of interacting residues. Nucleic Acids Res. 38(suppl_2), W417-W423 (2010)

    Google Scholar 

  24. Chen, Y.C., Wright, J.D., Lim, C.: DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res. 40(W1), W249–W256 (2012)

    Article  Google Scholar 

  25. Tsuchiya, Y., Kinoshita, K., Nakamura, H.: PreDs: a server for predicting dsDNA-binding site on protein molecular surfaces. Bioinformatics 21(8), 1721–1723 (2005)

    Article  Google Scholar 

  26. Yu, D.J., Hu, J., Tang, Z.M., et al.: Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 104, 180–190 (2013)

    Article  Google Scholar 

  27. Yang, J., Roy, A., Zhang, Y.: Protein–ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29(20), 2588–2595 (2013)

    Article  Google Scholar 

  28. Yu, D.J., Hu, J., Yang, J., et al.: Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(4), 994–1008 (2013)

    Article  Google Scholar 

  29. Chen, K., Mizianty, M.J., Kurgan, L.: ATPsite: sequence-based prediction of ATP-binding residues proteome science. BioMed Central 9(1), 1–8 (2011)

    Google Scholar 

  30. Chen, K., Mizianty, M.J., Kurgan, L.: Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 28(3), 331–341 (2012)

    Article  Google Scholar 

  31. Zhang, Q., Wang, S., Chen, Z., et al.: Locating transcription factor binding sites by fully convolutional neural network. Brief. Bioinform. 22(5), bbaa435 (2021)

    Google Scholar 

  32. Cui, Z., Chen, Z.H., Zhang, Q.H., et al.: Rmscnn: a random multi-scale convolutional neural network for marine microbial bacteriocins identification. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(6), 3663–3672 (2021)

    Google Scholar 

  33. Su, X., You, Z.H., Huang, D., et al.: Biomedical knowledge graph embedding with capsule network for multi-label drug-drug interaction prediction. IEEE Trans. Knowl. Data Eng. (2022)

    Google Scholar 

  34. Cui, Y., Dong, Q., Hong, D., et al.: Predicting protein-ligand binding residues with deep convolutional neural networks. BMC Bioinform. 20(1), 1–12 (2019)

    Article  Google Scholar 

  35. Li, W., Godzik, A.: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13), 1658–1659 (2006)

    Article  Google Scholar 

  36. Wang, Y., Ding, Y., Guo, F., et al.: Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS ONE 12(9), e0185587 (2017)

    Article  Google Scholar 

  37. Ding, Y., Tang, J., Guo, F.: Identification of protein–ligand binding sites by sequence information and ensemble classifier. J. Chem. Inf. Model. 57(12), 3149–3161 (2017)

    Article  Google Scholar 

  38. Ahmad, S., Sarai, A.: PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics 6, 1–6 (2005)

    Article  Google Scholar 

  39. UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47(D1), D506-D515 (2019)

    Google Scholar 

Download references

Acknowledgement

This paper is supported by the National Natural Science Foundation of China (62073231, 62176175, 61902271), National Research Project (2020YFC2006602), Provincial Key Laboratory for Computer Information Processing Technology, Soochow University (KJS2166), Opening Topic Fund of Big Data Intelligent Engineering Laboratory of Jiangsu Province (SDGC2157).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongjie Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, H., Zhu, B., Jiang, T., Cui, Z., Wu, H. (2023). A Transformer-Based Deep Learning Approach with Multi-layer Feature Processing for Accurate Prediction of Protein-DNA Binding Residues. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14088. Springer, Singapore. https://doi.org/10.1007/978-981-99-4749-2_47

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4749-2_47

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4748-5

  • Online ISBN: 978-981-99-4749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics