A Prediction Method of DNA-Binding Proteins Based on Evolutionary Information

Lu, Weizhong; Song, Zhengwei; Ding, Yijie; Wu, Hongjie; Huang, Hongmei

doi:10.1007/978-3-030-26969-2_40

Weizhong Lu^11,12,
Zhengwei Song¹¹,
Yijie Ding^11,12,
Hongjie Wu^11,12 &
…
Hongmei Huang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11644))

Included in the following conference series:

International Conference on Intelligent Computing

1426 Accesses
1 Citations

Abstract

DNA is the carrier of genetic information in organisms, and DNA-binding protein is one type of unwinding enzymes, which plays a key role in various biological molecular functions. That has greatly promoted the research of various methods for identifying DNA-binding proteins. In recent years, researchers have developed a Machine Learning-based method to predict DNA-binding proteins quickly and accurately. Although the prediction accuracy of current methods is considerable, the performance of their prediction can be further improved. In this paper, a DNA-binding proteins prediction model based on PSSM (Position Specific Scoring Matrix) features and Random Forest classifier is proposed. The results of experiments show that the proposed method can achieve great prediction performance on PDB1075 and PDB186 datasets, whose accuracy is 82.14% and 79.0%, respectively. Experiments show that the method can be compared with other methods, and even surpass the previous methods on some datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Luscombe, N.M., Austin, S.E., Thomton, J.M.: An overview of the structures of protein-DNA complexes. Genome Biol. 1(1), 1–37 (2000)
Article Google Scholar
Lou, W., Wang, X., Chen, F., et al.: Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and gaussian naïve bayes. PLoS ONE 9(1), e86703 (2014)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Stawiski, E.W., Gregoret, L.M., Mandel-Gutfreund, Y.: Annotating nucleic acid-binding function based on protein structure. J. Mol. Biol. 326(4), 1–1079 (2003)
Article Google Scholar
Shanahan, H.P., Garcia, M.A., Jones, S., et al.: Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res. 32(16), 4732–4741 (2004)
Article Google Scholar
Gao, M., Skolnick, J.: DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions[J]. Nucleic Acids Res. 36(12), 3978–3992 (2008)
Article Google Scholar
Szilágyi, A., Skolnick, J.: Efficient prediction of nucleic acid binding function from low-resolution protein structures. J. Mol. Biol. 358(3), 1–933 (2006)
Article Google Scholar
Nimrod, G., Schushan, M., Szilagyi, A., et al.: iDBPs: a web server for the identification of DNA binding proteins. Bioinformatics 26(5), 692–693 (2010)
Article Google Scholar
Zhao, H., Yang, Y., Zhou, Y.: Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics 26(15), 1857–1863 (2010)
Article Google Scholar
Liu, B., Xu, J., Lan, X., et al.: iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS ONE 9(9), e106691 (2014)
Article Google Scholar
Nanni, L., Brahnam, S., Lumini, A.: Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids 43(2), 657–665 (2012)
Article Google Scholar
Schaffer, A.A.: Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29(14), 2994–3005 (2001)
Article Google Scholar
Boeckmann, B.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003)
Article Google Scholar
Kumar, K.K., Pugalenthi, G., Suganthan, P.N.: DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. J. Biomol. Struct. Dyn. 26(6), 679–686 (2009)
Article Google Scholar
Kumar, M., Gromiha, M.M., Raghava, G.P.: Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinform. 8(1), 463 (2007)
Article Google Scholar
Wei-Zhong, L., Jian-An, F., Xuan, X., et al.: iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 6(9), e24756 (2011)
Google Scholar
Liu, B., Xu, J., Fan, S., et al.: PseDNA-Pro: DNA-binding protein identification by combining chou’s PseAAC and physicochemical distance transformation. Mol. Inform. 34(1), 8–17 (2015)
Article Google Scholar
Liu, B., Wang, S., Wang, X.: DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci. Rep. 5(4), 108–142 (2015)
Google Scholar
Dong, Q., Wang, S., Wang, K., et al.: Identification of DNA-binding proteins by auto-cross covariance transformation. In: IEEE International Conference on Bioinformatics & Biomedicine. IEEE (2015)
Google Scholar
Chou, K.C., Shen, H.B.: MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem. Biophys. Res. Commun. 360(2), 1–345 (2007)
Article Google Scholar
Chiu, T.P., Rao, S., Mann, R.S., et al.: Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein–DNA binding. Nucleic Acids Res. 45(21), 12565–12576 (2017)
Article Google Scholar
Liu, B.: Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach. J. Biomol. Struct. Dyn. 33(8), 1720–1730 (2015)
Article Google Scholar
Wu, J., Liu, H., Duan, X., et al.: Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics 25(1), 30–35 (2009)
Article Google Scholar
Xu, R., Zhou, J., Wang, H., et al.: Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC Syst. Biol. 9(S1), S10 (2015)
Article Google Scholar
Yang, R., Wu, H., Fu, Q., Ding, T., Chen, C.: Optimizing HP model using reinforcement learning. In: Huang, D.-S., Jo, K.-H., Zhang, X.-L. (eds.) ICIC 2018. LNCS, vol. 10955, pp. 383–388. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95933-7_46
Chapter Google Scholar
Chen, C., Wu, H., Bian, K.: β-barrel transmembrane protein predicting using support vector machine. In: Huang, D.-S., Hussain, A., Han, K., Gromiha, M.M. (eds.) ICIC 2017. LNCS (LNAI), vol. 10363, pp. 360–368. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63315-2_31
Chapter Google Scholar
Wu, H., Li, H., Jiang, M., et al.: Identify high-quality protein structural models by enhanced K-means. Biomed. Res. Int. 2017(18), 1–9 (2017)
Google Scholar
Huang, H.L., Lin, I.C., Liou, Y.F., et al.: Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties. BMC Bioinform. 12(S1), S47 (2011)
Article Google Scholar
Ji-Yong, A., Zhu-Hong, Y., Fan-Rong, M., et al.: RVMAB: using the relevance vector machine model combined with average blocks to predict the interactions of proteins from protein sequences. Int. J. Mol. Sci. 17(5), 757 (2016)
Google Scholar
Cong, S., Yijie, D., Jijun, T., et al.: Identification of DNA–protein binding sites through multi-scale local average blocks on sequence information. Molecules 22(12), 2079 (2017)
Article Google Scholar

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (61772357, 61502329, 61672371, and 61876217), Jiangsu Province 333 Talent Project, Top Talent Project (DZXX-010), Suzhou Foresight Research Project (SYG201704, SNG201610, and SZS201609).

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
Weizhong Lu, Zhengwei Song, Yijie Ding, Hongjie Wu & Hongmei Huang
Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology, Suzhou University of Science and Technology, Suzhou, 215009, China
Weizhong Lu, Yijie Ding & Hongjie Wu

Authors

Weizhong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengwei Song
View author publications
You can also search for this author in PubMed Google Scholar
Yijie Ding
View author publications
You can also search for this author in PubMed Google Scholar
Hongjie Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hongmei Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yijie Ding .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Nanchang Institute of Technology, Nanchang, China
Zhi-Kai Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, W., Song, Z., Ding, Y., Wu, H., Huang, H. (2019). A Prediction Method of DNA-Binding Proteins Based on Evolutionary Information. In: Huang, DS., Jo, KH., Huang, ZK. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11644. Springer, Cham. https://doi.org/10.1007/978-3-030-26969-2_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-26969-2_40
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26968-5
Online ISBN: 978-3-030-26969-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics