Skip to main content
Log in

Using hidden Markov models to predict DNA-binding proteins with sequence and structure information

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In the post-genome period, the protein domain structures are published rapidly, but they have not been studied comprehensively. To figure out the cell function, the protein–DNA interactions decrypt the protein domain structures in recent research. Several machine-learning based methods are applied to the issue; however, they are not efficient to translate the tertiary structure characteristics of proteins into appropriate features for predicting the DNA-binding proteins. In this work, a novel machine-learning approach based on hidden Markov models identifies the characteristics of DNA-binding proteins with their amino acid sequences and tertiary structures. After we distill the features from DNA-binding proteins, a support vector machine based classifier predicts general DNA-binding proteins with the accuracy of 88.45 % through fivefolds cross-validation. Furthermore, we construct a response element specific classifier for predicting response element specific DNA-binding proteins, and the performance achieves the precision of 96.57 % with recall rate as 88.83 % in average. To verify the prediction of DNA-binding proteins, we used the DNA-binding proteins from MCF-7 that are likely to bind with estrogen response elements (ERE), and the results show that our methods can apply to practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Ahmad S, Gromiha MM, Sarai A (2004) Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 20(4):477–486

    Article  Google Scholar 

  • Bairoch A, Boeckmann B, Ferro S, Gasteiger E (2004) Swiss-Prot: juggling between evolution and stability. Briefings Bioinform 5(1):39–55

    Article  Google Scholar 

  • Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL (2002) The Pfam protein families database. Nucl Acids Res 30(1):276–280

    Article  Google Scholar 

  • Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucl Acids Res 28(1):235–242. doi:10.1093/nar/28.1.235

    Article  Google Scholar 

  • Bhardwaj N, Langlois RE, Zhao G, Lu H (2005) Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucl Acids Res 33(20):6486–6493. doi:10.1093/nar/gki949

    Article  Google Scholar 

  • Bhardwaj N, Lu H (2007) Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions. FEBS Lett 581(5):1058–1066. doi:10.1016/j.febslet.2007.01.086

    Article  Google Scholar 

  • Cai YD, Lin SL (2003) Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochimica et Biophysica Acta 1648(1–2):127–133

    Google Scholar 

  • Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3): 1–27. doi:10.1145/1961189.1961199

    Google Scholar 

  • Cheng P-H, Chen H-Y, Kao H-Y (2010) Protein surface search in DNA-binding protein prediction by Delaunay triangulation modeling. Computer symposium (ICS), 2010 international, pp 783–788. doi:10.1109/COMPSYM.2010.5685406

  • Doyle LA, Yang W, Abruzzo LV, Krogmann T, Gao Y, Rishi AK, Ross DD (1998) A multidrug resistance transporter from human MCF-7 breast cancer cells. Proc Natl Acad Sci USA 95(26):15665–15670

    Article  Google Scholar 

  • Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763

    Article  Google Scholar 

  • Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 313(4):903–919. doi:10.1006/jmbi.2001.5080

    Article  Google Scholar 

  • Hubbard TJ, Murzin AG, Brenner SE, Chothia C (1997) SCOP: a structural classification of proteins database. Nucl Acids Res 25(1):236–239

    Article  Google Scholar 

  • Jones S, Shanahan HP, Berman HM, Thornton JM (2003) Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucl Acids Res 31(24):7189–7198

    Article  Google Scholar 

  • Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637

    Article  Google Scholar 

  • Krogh A, Brown M, Mian IS, Sjolander K, Haussler D (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235(5):1501–1531. doi:10.1006/jmbi1994.1104

    Google Scholar 

  • Kumar M, Gromiha MM, Raghava GP (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinform 8:463

    Article  Google Scholar 

  • Kummerfeld SK, Teichmann SA (2006) DBD: a transcription factor prediction database. Nucl Acids Res 34(Database issue):D74–81. doi:10.1093/nar/gkj131

    Google Scholar 

  • Latchman DS (1997) Transcription factors: an overview. Int J Biochem Cell Biol 29(12):1305–1312

    Article  Google Scholar 

  • Luscombe NM, Austin SE, Berman HM, Thornton JM (2000) An overview of the structures of protein–DNA complexes. Genome Biol 1(1):REVIEWS001. doi:10.1186/gb-2000-1-1-reviews001

  • Paillard G, Lavery R (2004) Analyzing protein–DNA recognition mechanisms. Structure 12(1):113–122

    Article  Google Scholar 

  • Samanta U, Bahadur RP, Chakrabarti P (2002) Quantifying the accessible surface area of protein residues in their local environment. Protein Eng 15(8):659–667

    Article  Google Scholar 

  • Sarai A, Kono H (2005) Protein–DNA recognition patterns and predictions. Annu Rev Biophys Biomol Struct 34:379–398. doi:10.1146/annurev.biophys.34.040204.144537

    Article  Google Scholar 

  • Stawiski EW, Gregoret LM, Mandel-Gutfreund Y (2003) Annotating nucleic acid-binding function based on protein structure. J Mol Biol 326(4):1065–1079

    Article  Google Scholar 

  • Stegmaier P, Kel AE, Wingender E (2004) Systematic DNA-binding domain classification of transcription factors. Genome Inform 15(2):276–286

    Google Scholar 

  • Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res 22(22):4673–4680

    Article  Google Scholar 

  • West M, Flanery D, Woytek K, Rangasamy D, Wilson VG (2001) Functional mapping of the DNA binding domain of bovine papillomavirus E1 protein. J Virol 75(24):11948–11960. doi:10.1128/jvi.75.24.11948-11960.2001

    Google Scholar 

  • Wingender E, Dietze P, Karas H, Knuppel R (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucl Acids Res 24(1):238–241

    Google Scholar 

  • Witten IH, Frank E, Trigg L, Hall M, Holmes G, Cunningham SJ (1999) Weka: practical machine learning tools and techniques with java implementations. ICONIP/ANZIIS/ANNES 99:192–196

    Google Scholar 

  • Yang JM, Tung CH (2006) Protein structure database search and evolutionary classification. Nucl Acids Res 34(13):3646–3659

    Article  Google Scholar 

  • Yu X, Cao J, Cai Y, Shi T, Li Y (2006) Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J Theor Biol 240(2):175–184

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hung-Yu Kao.

Additional information

Communicated by G. Acampora.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 17 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, YY., Chen, WJ., Chen, SH. et al. Using hidden Markov models to predict DNA-binding proteins with sequence and structure information. Soft Comput 18, 2365–2376 (2014). https://doi.org/10.1007/s00500-013-1210-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-013-1210-8

Keywords

Navigation