Skip to main content

A Prediction Method of DNA-Binding Proteins Based on Evolutionary Information

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11644))

Included in the following conference series:

Abstract

DNA is the carrier of genetic information in organisms, and DNA-binding protein is one type of unwinding enzymes, which plays a key role in various biological molecular functions. That has greatly promoted the research of various methods for identifying DNA-binding proteins. In recent years, researchers have developed a Machine Learning-based method to predict DNA-binding proteins quickly and accurately. Although the prediction accuracy of current methods is considerable, the performance of their prediction can be further improved. In this paper, a DNA-binding proteins prediction model based on PSSM (Position Specific Scoring Matrix) features and Random Forest classifier is proposed. The results of experiments show that the proposed method can achieve great prediction performance on PDB1075 and PDB186 datasets, whose accuracy is 82.14% and 79.0%, respectively. Experiments show that the method can be compared with other methods, and even surpass the previous methods on some datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Luscombe, N.M., Austin, S.E., Thomton, J.M.: An overview of the structures of protein-DNA complexes. Genome Biol. 1(1), 1–37 (2000)

    Article  Google Scholar 

  2. Lou, W., Wang, X., Chen, F., et al.: Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and gaussian naïve bayes. PLoS ONE 9(1), e86703 (2014)

    Article  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  4. Stawiski, E.W., Gregoret, L.M., Mandel-Gutfreund, Y.: Annotating nucleic acid-binding function based on protein structure. J. Mol. Biol. 326(4), 1–1079 (2003)

    Article  Google Scholar 

  5. Shanahan, H.P., Garcia, M.A., Jones, S., et al.: Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res. 32(16), 4732–4741 (2004)

    Article  Google Scholar 

  6. Gao, M., Skolnick, J.: DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions[J]. Nucleic Acids Res. 36(12), 3978–3992 (2008)

    Article  Google Scholar 

  7. Szilágyi, A., Skolnick, J.: Efficient prediction of nucleic acid binding function from low-resolution protein structures. J. Mol. Biol. 358(3), 1–933 (2006)

    Article  Google Scholar 

  8. Nimrod, G., Schushan, M., Szilagyi, A., et al.: iDBPs: a web server for the identification of DNA binding proteins. Bioinformatics 26(5), 692–693 (2010)

    Article  Google Scholar 

  9. Zhao, H., Yang, Y., Zhou, Y.: Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics 26(15), 1857–1863 (2010)

    Article  Google Scholar 

  10. Liu, B., Xu, J., Lan, X., et al.: iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS ONE 9(9), e106691 (2014)

    Article  Google Scholar 

  11. Nanni, L., Brahnam, S., Lumini, A.: Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids 43(2), 657–665 (2012)

    Article  Google Scholar 

  12. Schaffer, A.A.: Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29(14), 2994–3005 (2001)

    Article  Google Scholar 

  13. Boeckmann, B.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003)

    Article  Google Scholar 

  14. Kumar, K.K., Pugalenthi, G., Suganthan, P.N.: DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. J. Biomol. Struct. Dyn. 26(6), 679–686 (2009)

    Article  Google Scholar 

  15. Kumar, M., Gromiha, M.M., Raghava, G.P.: Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinform. 8(1), 463 (2007)

    Article  Google Scholar 

  16. Wei-Zhong, L., Jian-An, F., Xuan, X., et al.: iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 6(9), e24756 (2011)

    Google Scholar 

  17. Liu, B., Xu, J., Fan, S., et al.: PseDNA-Pro: DNA-binding protein identification by combining chou’s PseAAC and physicochemical distance transformation. Mol. Inform. 34(1), 8–17 (2015)

    Article  Google Scholar 

  18. Liu, B., Wang, S., Wang, X.: DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci. Rep. 5(4), 108–142 (2015)

    Google Scholar 

  19. Dong, Q., Wang, S., Wang, K., et al.: Identification of DNA-binding proteins by auto-cross covariance transformation. In: IEEE International Conference on Bioinformatics & Biomedicine. IEEE (2015)

    Google Scholar 

  20. Chou, K.C., Shen, H.B.: MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem. Biophys. Res. Commun. 360(2), 1–345 (2007)

    Article  Google Scholar 

  21. Chiu, T.P., Rao, S., Mann, R.S., et al.: Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein–DNA binding. Nucleic Acids Res. 45(21), 12565–12576 (2017)

    Article  Google Scholar 

  22. Liu, B.: Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach. J. Biomol. Struct. Dyn. 33(8), 1720–1730 (2015)

    Article  Google Scholar 

  23. Wu, J., Liu, H., Duan, X., et al.: Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics 25(1), 30–35 (2009)

    Article  Google Scholar 

  24. Xu, R., Zhou, J., Wang, H., et al.: Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC Syst. Biol. 9(S1), S10 (2015)

    Article  Google Scholar 

  25. Yang, R., Wu, H., Fu, Q., Ding, T., Chen, C.: Optimizing HP model using reinforcement learning. In: Huang, D.-S., Jo, K.-H., Zhang, X.-L. (eds.) ICIC 2018. LNCS, vol. 10955, pp. 383–388. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95933-7_46

    Chapter  Google Scholar 

  26. Chen, C., Wu, H., Bian, K.: β-barrel transmembrane protein predicting using support vector machine. In: Huang, D.-S., Hussain, A., Han, K., Gromiha, M.M. (eds.) ICIC 2017. LNCS (LNAI), vol. 10363, pp. 360–368. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63315-2_31

    Chapter  Google Scholar 

  27. Wu, H., Li, H., Jiang, M., et al.: Identify high-quality protein structural models by enhanced K-means. Biomed. Res. Int. 2017(18), 1–9 (2017)

    Google Scholar 

  28. Huang, H.L., Lin, I.C., Liou, Y.F., et al.: Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties. BMC Bioinform. 12(S1), S47 (2011)

    Article  Google Scholar 

  29. Ji-Yong, A., Zhu-Hong, Y., Fan-Rong, M., et al.: RVMAB: using the relevance vector machine model combined with average blocks to predict the interactions of proteins from protein sequences. Int. J. Mol. Sci. 17(5), 757 (2016)

    Google Scholar 

  30. Cong, S., Yijie, D., Jijun, T., et al.: Identification of DNA–protein binding sites through multi-scale local average blocks on sequence information. Molecules 22(12), 2079 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (61772357, 61502329, 61672371, and 61876217), Jiangsu Province 333 Talent Project, Top Talent Project (DZXX-010), Suzhou Foresight Research Project (SYG201704, SNG201610, and SZS201609).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yijie Ding .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, W., Song, Z., Ding, Y., Wu, H., Huang, H. (2019). A Prediction Method of DNA-Binding Proteins Based on Evolutionary Information. In: Huang, DS., Jo, KH., Huang, ZK. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11644. Springer, Cham. https://doi.org/10.1007/978-3-030-26969-2_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26969-2_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26968-5

  • Online ISBN: 978-3-030-26969-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics