Abstract
DNA is the carrier of genetic information in organisms, and DNA-binding protein is one type of unwinding enzymes, which plays a key role in various biological molecular functions. That has greatly promoted the research of various methods for identifying DNA-binding proteins. In recent years, researchers have developed a Machine Learning-based method to predict DNA-binding proteins quickly and accurately. Although the prediction accuracy of current methods is considerable, the performance of their prediction can be further improved. In this paper, a DNA-binding proteins prediction model based on PSSM (Position Specific Scoring Matrix) features and Random Forest classifier is proposed. The results of experiments show that the proposed method can achieve great prediction performance on PDB1075 and PDB186 datasets, whose accuracy is 82.14% and 79.0%, respectively. Experiments show that the method can be compared with other methods, and even surpass the previous methods on some datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Luscombe, N.M., Austin, S.E., Thomton, J.M.: An overview of the structures of protein-DNA complexes. Genome Biol. 1(1), 1–37 (2000)
Lou, W., Wang, X., Chen, F., et al.: Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and gaussian naïve bayes. PLoS ONE 9(1), e86703 (2014)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Stawiski, E.W., Gregoret, L.M., Mandel-Gutfreund, Y.: Annotating nucleic acid-binding function based on protein structure. J. Mol. Biol. 326(4), 1–1079 (2003)
Shanahan, H.P., Garcia, M.A., Jones, S., et al.: Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res. 32(16), 4732–4741 (2004)
Gao, M., Skolnick, J.: DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions[J]. Nucleic Acids Res. 36(12), 3978–3992 (2008)
Szilágyi, A., Skolnick, J.: Efficient prediction of nucleic acid binding function from low-resolution protein structures. J. Mol. Biol. 358(3), 1–933 (2006)
Nimrod, G., Schushan, M., Szilagyi, A., et al.: iDBPs: a web server for the identification of DNA binding proteins. Bioinformatics 26(5), 692–693 (2010)
Zhao, H., Yang, Y., Zhou, Y.: Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics 26(15), 1857–1863 (2010)
Liu, B., Xu, J., Lan, X., et al.: iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS ONE 9(9), e106691 (2014)
Nanni, L., Brahnam, S., Lumini, A.: Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids 43(2), 657–665 (2012)
Schaffer, A.A.: Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29(14), 2994–3005 (2001)
Boeckmann, B.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003)
Kumar, K.K., Pugalenthi, G., Suganthan, P.N.: DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. J. Biomol. Struct. Dyn. 26(6), 679–686 (2009)
Kumar, M., Gromiha, M.M., Raghava, G.P.: Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinform. 8(1), 463 (2007)
Wei-Zhong, L., Jian-An, F., Xuan, X., et al.: iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 6(9), e24756 (2011)
Liu, B., Xu, J., Fan, S., et al.: PseDNA-Pro: DNA-binding protein identification by combining chou’s PseAAC and physicochemical distance transformation. Mol. Inform. 34(1), 8–17 (2015)
Liu, B., Wang, S., Wang, X.: DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci. Rep. 5(4), 108–142 (2015)
Dong, Q., Wang, S., Wang, K., et al.: Identification of DNA-binding proteins by auto-cross covariance transformation. In: IEEE International Conference on Bioinformatics & Biomedicine. IEEE (2015)
Chou, K.C., Shen, H.B.: MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem. Biophys. Res. Commun. 360(2), 1–345 (2007)
Chiu, T.P., Rao, S., Mann, R.S., et al.: Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein–DNA binding. Nucleic Acids Res. 45(21), 12565–12576 (2017)
Liu, B.: Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach. J. Biomol. Struct. Dyn. 33(8), 1720–1730 (2015)
Wu, J., Liu, H., Duan, X., et al.: Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics 25(1), 30–35 (2009)
Xu, R., Zhou, J., Wang, H., et al.: Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC Syst. Biol. 9(S1), S10 (2015)
Yang, R., Wu, H., Fu, Q., Ding, T., Chen, C.: Optimizing HP model using reinforcement learning. In: Huang, D.-S., Jo, K.-H., Zhang, X.-L. (eds.) ICIC 2018. LNCS, vol. 10955, pp. 383–388. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95933-7_46
Chen, C., Wu, H., Bian, K.: β-barrel transmembrane protein predicting using support vector machine. In: Huang, D.-S., Hussain, A., Han, K., Gromiha, M.M. (eds.) ICIC 2017. LNCS (LNAI), vol. 10363, pp. 360–368. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63315-2_31
Wu, H., Li, H., Jiang, M., et al.: Identify high-quality protein structural models by enhanced K-means. Biomed. Res. Int. 2017(18), 1–9 (2017)
Huang, H.L., Lin, I.C., Liou, Y.F., et al.: Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties. BMC Bioinform. 12(S1), S47 (2011)
Ji-Yong, A., Zhu-Hong, Y., Fan-Rong, M., et al.: RVMAB: using the relevance vector machine model combined with average blocks to predict the interactions of proteins from protein sequences. Int. J. Mol. Sci. 17(5), 757 (2016)
Cong, S., Yijie, D., Jijun, T., et al.: Identification of DNA–protein binding sites through multi-scale local average blocks on sequence information. Molecules 22(12), 2079 (2017)
Acknowledgements
This paper is supported by the National Natural Science Foundation of China (61772357, 61502329, 61672371, and 61876217), Jiangsu Province 333 Talent Project, Top Talent Project (DZXX-010), Suzhou Foresight Research Project (SYG201704, SNG201610, and SZS201609).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, W., Song, Z., Ding, Y., Wu, H., Huang, H. (2019). A Prediction Method of DNA-Binding Proteins Based on Evolutionary Information. In: Huang, DS., Jo, KH., Huang, ZK. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11644. Springer, Cham. https://doi.org/10.1007/978-3-030-26969-2_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-26969-2_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26968-5
Online ISBN: 978-3-030-26969-2
eBook Packages: Computer ScienceComputer Science (R0)