Skip to main content

Advertisement

Log in

DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

DNA-binding proteins (DBPs) participate in various biological processes including DNA replication, recombination, and repair. In the human genome, about 6–7% of these proteins are utilized for genes encoding. DBPs shape the DNA into a compact structure known chromatin while some of these proteins regulate the chromosome packaging and transcription process. In the pharmaceutical industry, DBPs are used as a key component of antibiotics, steroids, and cancer drugs. These proteins also involve in biophysical, biological, and biochemical studies of DNA. Due to the crucial role in various biological activities, identification of DBPs is a hot issue in protein science. A series of experimental and computational methods have been proposed, however, some methods didn’t achieve the desired results while some are inadequate in its accuracy and authenticity. Still, it is highly desired to present more intelligent computational predictors. In this work, we introduce an innovative computational method namely DP-BINDER based on physicochemical and evolutionary information. We captured local highly decisive features from physicochemical properties of primary protein sequences via normalized Moreau-Broto autocorrelation (NMBAC) and evolutionary information by position specific scoring matrix-transition probability composition (PSSM-TPC) and pseudo-position specific scoring matrix (PsePSSM) using training and independent datasets. The optimal features were selected by the support vector machine-recursive feature elimination and correlation bias reduction (SVM-RFE + CBR) from fused features and were fed into random forest (RF) and support vector machine (SVM). Our method attained 92.46% and 89.58% accuracy with jackknife and ten-fold cross-validation, respectively on the training dataset, while 81.17% accuracy on the independent dataset for prediction of DBPs. These results demonstrate that our method attained the highest success rate in the literature. The superiority of DP-BINDER over existing approaches due to several reasons including abstraction of local dominant features via effective feature descriptors, utilization of appropriate feature selection algorithms and effective classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Ali F, Kabir M, Arif M, Swati ZNK, Khan ZU, Ullah M, Yu D-J (2018) Chemom Intell Lab Syst 182:21

    Article  CAS  Google Scholar 

  2. Ji G, Lin Y, Lin Q, Huang G, Zhu W, You W (2016) Predicting DNA-binding proteins using feature fusion and MSVM-RFE. In: 10th IEEE international conference on anti-counterfeiting, security, and identification (ASID) 2016, p 109

  3. Latchman DS (1997) Int J Biochem Cell Biol 29(12):1305

    Article  CAS  PubMed  Google Scholar 

  4. Semenza GL (1998) Transcription factors and human disease. Oxford Monographs on Medical Genetics. Oxford University Press, Oxford

    Google Scholar 

  5. Al-Lazikani B, Hopkins A (2006) Nat Rev Drug Discov 5:993

    Article  CAS  PubMed  Google Scholar 

  6. Gronemeyer H, Gustafsson J-Å, Laudet V (2004) Nat Rev Drug Discov 3(11):950

    Article  CAS  PubMed  Google Scholar 

  7. Zou Y, Liu Y, Wu X, Shell SM (2006) J Cell Physiol 208(2):267

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Vinkemeier U, Moarefi I, Darnell JE, Kuriyan J (1998) Science 279(5353):1048

    Article  CAS  PubMed  Google Scholar 

  9. Hoskisson PA, Rigali S (2009) Adv Appl Microbiol 69:1

    Article  CAS  PubMed  Google Scholar 

  10. Yu S, Luo J, Song Z, Ding F, Dai Y, Li N (2011) Cell Res 21(11):1638

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Hauschild J, Petersen B, Santiago Y, Queisser A-L, Carnwath JW, Lucas-Hahn A, Zhang L, Meng X, Gregory PD, Schwinzer R (2011) Proc Natl Acad Sci USA 108(29):12013

    Article  PubMed  Google Scholar 

  12. Geurts AM, Cost GJ, Freyvert Y, Zeitler B, Miller JC, Choi VM, Jenkins SS, Wood A, Cui X, Meng X (2009) Science 325(5939):433

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Curtin SJ, Zhang F, Sander JD, Haun WJ, Starker C, Baltes NJ, Reyon D, Dahlborg EJ, Goodwin MJ, Coffman AP (2011) Plant Physiol 156(2):466

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Cai CQ, Doyon Y, Ainley WM, Miller JC, DeKelver RC, Moehle EA, Rock JM, Lee Y-L, Garrison R, Schulenberg L (2009) Plant Mol Biol 69(6):699

    Article  CAS  PubMed  Google Scholar 

  15. Shukla VK, Doyon Y, Miller JC, DeKelver RC, Moehle EA, Worden SE, Mitchell JC, Arnold NL, Gopalan S, Meng X (2009) Nature 459(7245):437

    Article  CAS  PubMed  Google Scholar 

  16. Tebas P, Stein D, Tang WW, Frank I, Wang SQ, Lee G, Spratt SK, Surosky RT, Giedlin MA, Nichol G (2014) N Engl J Med 370(10):901

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Murugesapillai D, McCauley MJ, Huo R, Nelson Holte MH, Stepanyants A, Maher LJ III, Israeloff NE, Williams MC (2014) Nucleic Acids Res 42(14):8996

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Grosschedl R, Giese K, Pagel J (1994) Trends Genet 10(3):94

    Article  CAS  PubMed  Google Scholar 

  19. Khrapko KR, Khorlin AA, Ivanov IB, Ershov GM, Lysov JP, Florentiev VL, Mirzabekov AD (1996) Methods of DNA sequencing by hybridization based on optimizing concentration of matrix-bound oligonucleotide and device for carrying out same. Google Patents

  20. Freeman K, Gwadz M, Shore D (1995) Genetics 141(4):1253

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Jaiswal R, Singh SK, Bastia D, Escalante CR (2015) Acta Crystallogr Sect F: Struct Biol Commun 71(4):414

    Article  CAS  Google Scholar 

  22. Omichinski JG, Clore GM, Schaad O, Felsenfeld G, Trainor C, Appella E, Stahl SJ, Gronenborn AM (1993) Science 261(5120):438

    Article  CAS  PubMed  Google Scholar 

  23. Consortium U (2016) Nucleic Acids Res 45(D1):D158

    Google Scholar 

  24. Lin W-Z, Fang J-A, Xiao X, Chou K-C (2011) PLoS ONE 6(9):e24756

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Xu R, Zhou J, Liu B, He Y, Zou Q, Wang X, Chou K-C (2015) J Biomol Struct Dyn 33(8):1720

    Article  CAS  PubMed  Google Scholar 

  26. Shanahan HP, Garcia MA, Jones S, Thornton JM (2004) Nucleic Acids Res 32(16):4732

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Gao M, Skolnick J (2009) PLoS Comput Biol 5(11):e1000567

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Nimrod G, Schushan M, Szilágyi A, Leslie C, Ben-Tal N (2010) Bioinformatics 26(5):692

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Ahmad S, Sarai A (2004) J Mol Biol 341(1):65

    Article  CAS  PubMed  Google Scholar 

  30. Bhardwaj N, Langlois RE, Zhao G, Lu H (2005) Nucleic Acids Res 33(20):6486

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Cai Y, He J, Li X, Lu L, Yang X, Feng K, Lu W, Kong X (2008) J Proteome Res 8(2):999

    Article  CAS  Google Scholar 

  32. Pröpper K, Meindl K, Sammito M, Dittrich B, Sheldrick GM, Pohl E, Usón I (2014) Acta Crystallogr D Biol Crystallogr 70(6):1743

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Zhao H, Wang J, Zhou Y, Yang Y (2014) PLoS ONE 9(5):e96694

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Zhang J, Gao B, Chai H, Ma Z, Yang G (2016) BMC Bioinform 17(1):323

    Article  CAS  Google Scholar 

  35. Chou K-C (2015) Med Chem 11(3):218

    Article  CAS  PubMed  Google Scholar 

  36. Kumar KK, Pugalenthi G, Suganthan P (2009) J Biomol Struct Dyn 26(6):679

    Article  CAS  PubMed  Google Scholar 

  37. Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, Chou K-C (2014) PLoS ONE 9(9):e106691

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Lou W, Wang X, Chen F, Chen Y, Jiang B, Zhang H (2014) PLoS ONE 9(1):e86703

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Liu B, Wang S, Wang X (2015) Scientific reports 5:15479

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Dong Q, Wang S, Wang K, Liu X, Liu B (2015) Identification of DNA-binding proteins by auto-cross covariance transformation. In: IEEE international conference on bioinformatics and biomedicine (BIBM), 2015, p 470

  41. Liu B, Xu J, Fan S, Xu R, Zhou J, Wang X (2015) Mol Inform 34(1):8

    Article  CAS  PubMed  Google Scholar 

  42. Wei L, Tang J, Zou Q (2017) Inf Sci 384:135

    Article  Google Scholar 

  43. Im J, Tuvshinjargal N, Park B, Lee W, Huang D-S, Han K (2015) PNImodeler: web server for inferring protein-binding nucleotides from sequence data. BioMed Central, BMC Genom, p S6

    Google Scholar 

  44. Xu R, Zhou J, Wang H, He Y, Wang X, Liu B (2015) Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BioMed Central, BMC Syst Biol, p S10

    Google Scholar 

  45. Paz I, Kligun E, Bengad B, Mandel-Gutfreund Y (2016) Nucleic Acids Res 44(W1):W568

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Zhang J, Liu B (2017) Int J Mol Sci 18(9):1856

    Article  CAS  PubMed Central  Google Scholar 

  47. Zaman R, Chowdhury SY, Rashid MA, Sharma A, Dehzangi A, Shatabda S (2017) Biomed Res Int. https://doi.org/10.1155/2017/4590609

    Article  PubMed  PubMed Central  Google Scholar 

  48. Chowdhury SY, Shatabda S, Dehzangi A (2017) Sci Rep 7(1):14938

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Liu X-J, Gong X-J, Yu H, Xu J-H (2018) Genes 9(8):394

    Article  CAS  PubMed Central  Google Scholar 

  50. Rohs R, Jin X, West SM, Joshi R, Honig B, Mann RS (2010) Annu Rev Biochem 79:233

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2006) The protein data bank, 1999. In: Rossmann MG, Arnold E (eds) International tables for crystallography Volume F: crystallography of biological macromolecules. Springer, Dordrecht, p 675

    Chapter  Google Scholar 

  52. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucleic Acids Res 25(17):3389

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Yousef A, Charkari NM (2015) J Biomed Inform 56:300

    Article  PubMed  Google Scholar 

  54. Li Z-R, Lin HH, Han L, Jiang L, Chen X, Chen YZ (2006) Nucleic Acids Res 34(suppl_2):W32

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Guo Y, Yu L, Wen Z, Li M (2008) Nucleic Acids Res 36(9):3025

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Kressel U (1998) Advances in kernel methods: support vector learning. MIT Press, Cambridge, p 255

    Google Scholar 

  57. Vapnik V (1998) Statistical learning theory. Wiley, New York

    Google Scholar 

  58. Wan S, Mak M-W, Kung S-Y (2017) Chemom Intell Lab Syst 162:1

    Article  CAS  Google Scholar 

  59. Zhang S (2015) Chemom Intell Lab Syst 142:28

    Article  CAS  Google Scholar 

  60. Luo J, Yu L, Guo Y, Li M (2012) Chemom Intell Lab Syst 110(1):163

    Article  CAS  Google Scholar 

  61. Sharma R, Dehzangi A, Lyons J, Paliwal K, Tsunoda T, Sharma A (2015) IEEE Trans Nanobiosci 14(8):915

    Article  Google Scholar 

  62. Cui X, Yu Z, Yu B, Wang M, Tian B, Ma Q (2019) Chemom Intell Lab Syst 184:28

    Article  CAS  Google Scholar 

  63. Zhang S, Ye F, Yuan X (2012) J Biomol Struct Dyn 29(6):1138

    Article  CAS  Google Scholar 

  64. Mundra PA, Rajapakse JC (2007) SVM-RFE with relevancy and redundancy criteria for gene selection. In: IAPR international workshop on pattern recognition in bioinformatics, Springer, 2007, p 242

  65. Duan K-B, Rajapakse JC, Wang H, Azuaje F (2005) IEEE Trans Nanobiosci 4(3):228

    Article  Google Scholar 

  66. Ali F, Hayat M (2015) J Theor Biol 384:78

    Article  CAS  PubMed  Google Scholar 

  67. Ali F, Hayat M (2016) J Theor Biol 403:30

    Article  CAS  PubMed  Google Scholar 

  68. Ahmed S, Kabir M, Ali Z, Arif M, Ali F, Yu D-J (2018) Comb Chem High Throughput Screening 21(9):631

    Article  CAS  Google Scholar 

  69. Ahmed S, Kabir M, Arif M, Ali Z, Ali F, Swati ZNK (2018) Int J Data Min Bioinform 21(3):212

    Article  Google Scholar 

  70. Gong R, Wu C, Chu M (2018) Chemom Intell Lab Syst 172:109

    Article  CAS  Google Scholar 

  71. Sun B-Y, Zhu Z-H, Li J, Linghu B (2011) IEEE/ACM Trans Comput Biol Bioinf 8(6):1671

    Article  Google Scholar 

  72. Chu M, Gong R, Gao S, Zhao J (2017) Chemom Intell Lab Syst 171:140

    Article  CAS  Google Scholar 

  73. Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Chemom Intell Lab Syst 83(2):83

    Article  CAS  Google Scholar 

  74. Duda RO, Hart PE, Stork DG (2002) Pattern classification. Wiley Interscience, Hoboken

    Google Scholar 

  75. Ahmad S, Kabir M, Hayat M (2015) Comput Methods Programs Biomed 122(2):165

    Article  PubMed  Google Scholar 

  76. Kabir M, Iqbal M, Ahmad S, Hayat M (2015) Comput Biol Med 66:252

    Article  CAS  PubMed  Google Scholar 

  77. Chen CC, Schwender H, Keith J, Nunkesser R, Mengersen K, Macrossan P (2011) IEEE/ACM Trans Comput Biol Bioinf 8(6):1580

    Article  Google Scholar 

  78. Nanni L, Lumini A, Gupta D, Garg A (2012) IEEE/ACM Trans Comput Biol Bioinf 9(2):467

    Article  Google Scholar 

  79. Kabir M, Ahmad S, Iqbal M, Swati ZNK, Liu Z, Yu D-J (2018) Chemom Intell Lab Syst 174:22

    Article  CAS  Google Scholar 

  80. Wang T, Yang J (2010) Protein Pept Lett 17(1):32

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61772273, 61373062) and the Fundamental Research Funds for the Central Universities (Grant No. 30918011104).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farman Ali.

Ethics declarations

Conflict of interest

The authors declare that they no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, F., Ahmed, S., Swati, Z.N.K. et al. DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information. J Comput Aided Mol Des 33, 645–658 (2019). https://doi.org/10.1007/s10822-019-00207-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-019-00207-x

Keywords

Navigation