ABSTRACT
DNA binding protein (DBP) plays an important role in various biological processes including DNA replication, recombination, and repair. Because of its important role in various biological activities, identification of DBP is a challenge to continue to be developed. DPB identification was initially carried out by the experimental method. However, this method is expensive and takes a lot of time. For this reason, in the last decades machine-based learning methods have been developed. Although several machine learning-based prediction methods have been developed. Research in this field is still open to continuously improving its performance. One of the efforts to improve the prediction performance of DBP is by selecting the appropriate feature vector extraction algorithm from amino acid sequences. In this paper we have used PsePSSM as feature representation and SVM with the RBF kernel combined with FC feature selection as a predictive model. Determination of the best performance is facilitated by evaluating the parameters of PsePSSM, SVM and FC. The results of the evaluation of the best performance parameters achieved an accuracy of 79.45% and AUC of 79.6%.
- R. E. Langlois and H. Lu, "Boosting the prediction and understanding of DNA-binding domains from sequence," Nucleic Acids Res., vol. 38, no. 10, pp. 3149--3158, 2010.Google ScholarCross Ref
- F. Cajone, M. Salina, and A. Benelli-Zazzera, "4-Hydroxynonenal induces a DNA-binding protein similar to the heat-shock factor," Biochem. J., vol. 262, no. 3, pp. 977--979, Sep. 1989.Google ScholarCross Ref
- M. J. Buck and J. D. Lieb, "ChIP-chip: Considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments," Genomics, vol. 83, no. 3. Academic Press, pp. 349--360, 01-Mar-2004.Google Scholar
- C. C. Chou, T. W. Lin, C. Y. Chen, and A. H. J. H.-J. Wang, "Crystal structure of the hyperthermophilic archaeal DNA-binding protein Sso10b2 at a resolution of 1.85 Angstroms," J. Bacteriol., vol. 185, no. 14, pp. 4066--4073, Jul. 2003.Google ScholarCross Ref
- W. Lou, X. Wang, F. Chen, Y. Chen, B. Jiang, and H. Zhang, "Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naive Bayes," PLoS One, vol. 9, no. 1, p. e86703, Jan. 2014.Google ScholarCross Ref
- M. M. Gromiha and R. Nagarajan, "Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes," in Advances in Protein Chemistry and Structural Biology, vol. 91, Academic Press, 2013, pp. 65--99.Google Scholar
- K. Pröpper et al., "Structure solution of DNA-binding proteins and complexes with ARCIMBOLDO libraries," Acta Crystallogr. Sect. D Biol. Crystallogr., vol. 70, no. 6, pp. 1743--1757, 2014.Google ScholarCross Ref
- H. Zhao, J. Wang, Y. Zhou, and Y. Yang, "Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome," PLoS One, vol. 9, no. 5, pp. 26--28, 2014.Google ScholarCross Ref
- W. Z. Lin, J. A. Fang, X. Xiao, and K. C. Chou, "iDNA-prot: Identification of DNA binding proteins using random forest with grey model," PLoS One, vol. 6, no. 9, p. 24756, 2011.Google Scholar
- M. Kumar, M. M. Gromiha, and G. P. Raghava, "Identification of DNA-binding proteins using support vector machines and evolutionary profiles," BMC Bioinformatics, vol. 8, no. 1, p. 463, Nov. 2007.Google ScholarCross Ref
- Y. D. Cai and S. L. Lin, "Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence," vol. 1648, no. 1--2, pp. 127--133, May 2003.Google Scholar
- B. Liu, J. Xu, S. Fan, R. Xu, J. Zhou, and X. Wang, "PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation," Mol. Inform., vol. 34, no. 1, pp. 8--17, Jan. 2015.Google ScholarCross Ref
- Y. Fang, Y. Guo, Y. Feng, and M. Li, "Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features," Amino Acids, vol. 34, no. 1, pp. 103--109, Jan. 2008.Google ScholarCross Ref
- L. Nanni, S. Brahnam, and A. Lumini, "High performance set of PseAAC and sequence based descriptors for protein classification," J. Theor. Biol., vol. 266, no. 1, pp. 1--10, Sep. 2010.Google ScholarCross Ref
- X. Yu, J. Cao, Y. Cai, T. Shi, and Y. Li, "Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines," J. Theor. Biol., vol. 240, no. 2, pp. 175--184, May 2006.Google ScholarCross Ref
- S. F. Altschul et al., "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Oxford University Press, 1997.Google Scholar
- S.-Y. Y. Ho, F.-C. C. Yu, C.-Y. Y. Chang, and H.-L. L. Huang, "Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method," Biosystems, vol. 90, no. 1, pp. 234--241, Jul. 2007.Google ScholarCross Ref
- B. Liu, S. Wang, and X. Wang, "DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation," Sci. Rep., vol. 5, Oct. 2015.Google Scholar
- C. Shen, Y. Ding, J. Tang, J. Song, and F. Guo, "Identification of DNA-protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information," Molecules, vol. 22, no. 12, pp. 1--20, Nov. 2017.Google ScholarCross Ref
- A. Ridok, N. Widodo, W. F. Mahmudy, and M. Rifai, "FC-SVM: DNA binding Proteins prediction with Average Blocks (AB) descriptors using SVM with FC feature Selection," in Proceedings of 2019 4th International Conference on Sustainable Information Engineering and Technology, SIET 2019, 2019, pp. 22--27.Google Scholar
- R. Xu, J. Zhou, H. Wang, Y. He, X. Wang, and B. Liu, "Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation," BMC Syst. Biol., vol. 9, no. 1, p. S10, 2015.Google ScholarCross Ref
- W. Lu et al., "Use Chou's 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information," Biomed Res. Int., vol. 2020, pp. 1--9, 2020.Google ScholarCross Ref
- J. Hu, X. Zhou, Y.-H. Zhu, D.-J. Yu, and G. Zhang, "TargetDBP: Accurate DNA-Binding Protein Prediction via Sequence-based Multi-View Feature Learning," IEEE/ACM Trans. Comput. Biol. Bioinforma., pp. 1--1, 2019.Google Scholar
- H. Tjong, H.-X. X. Zhou, and H. Tjong, "DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces," Nucleic Acids Res., vol. 35, no. 5, pp. 1465--1477, 2007.Google ScholarCross Ref
- L. Wang, C. Huang, M. Q. Yang, and J. Y. Yang, "BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features," BMC Syst. Biol., vol. 4, no. SUPPL. 1, p. S3, May 2010.Google ScholarCross Ref
- C. Zou, J. Gong, and H. Li, "An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis," BMC Bioinformatics, vol. 14, no. 1, p. 90, Mar. 2013.Google ScholarCross Ref
- S. Ahmad and A. Sarai, "PSSM-based prediction of DNA binding sites in proteins," BMC Bioinformatics, vol. 6, no. 1, p. 33, Feb. 2005.Google ScholarCross Ref
- M. Andrabi, K. Mizuguchi, A. Sarai, and S. Ahmad, "Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks," BMC Struct. Biol., vol. 9, no. 1, p. 30, May 2009.Google ScholarCross Ref
- W.-Z. W.-Z. W.-Z. Lin, J.-A. J.-A. Fang, X. X. Chou, X. Xiao, and K.-C. Chou, "iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model," PLoS One, vol. 6, no. 9, p. 24756, Sep. 2011.Google ScholarCross Ref
- G. Nimrod, M. Schushan, A. Szilágyi, C. Leslie, and N. Ben-Tal, "iDBPs: A web server for the identification of DNA binding proteins," Bioinformatics, vol. 26, no. 5, pp. 692--693, 2010. Google ScholarDigital Library
- L. Wang, M. Q. Yang, and J. Y. Yang, "Prediction of DNA-binding residues from protein sequence information using random forests," BMC Genomics, vol. 10, no. SUPPL. 1, p. S1, Jul. 2009.Google ScholarCross Ref
- C. Yan, M. Terribilini, F. Wu, R. L. Jernigan, D. Dobbs, and V. Honavar, "Predicting DNA-binding sites of proteins from amino acid sequence," BMC Bioinformatics, vol. 7, no. 1, p. 262, May 2006.Google ScholarCross Ref
- Z. Qian, Y. D. Cai, and Y. Li, "A novel computational method to predict transcription factor DNA binding preference," Biochem. Biophys. Res. Commun., vol. 348, no. 3, pp. 1034--1037, Sep. 2006.Google ScholarCross Ref
- B. Liu et al., "IDNA-Prot|dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition," PLoS One, vol. 9, no. 9, 2014.Google Scholar
- G. Wang and R. L. Dunbrack, "PISCES: Recent improvements to a PDB sequence culling server," Nucleic Acids Res., vol. 33, no. SUPPL. 2, 2005.Google ScholarCross Ref
- B. Boeckmann et al., "The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003," Nucleic Acids Research, vol. 31, no. 1. pp. 365--370, 2003.Google ScholarCross Ref
- L. Nanni, A. Lumini, and S. Brahnam, "An empirical study of different approaches for protein classification," Sci. World J., vol. 2014, Jun. 2014.Google ScholarCross Ref
- C. C. Chang and C. J. Lin, "LIBSVM: A Library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1--39, 2011. Google ScholarDigital Library
- F. Idrees, M. Rajarajan, M. Conti, T. M. Chen, and Y. Rahulamathavan, "PIndroid: A novel Android malware detection system using ensemble learning methods," Comput. Secur., vol. 68, pp. 36--46, 2017. Google ScholarDigital Library
Index Terms
- Prediction of DNA binding protein using FC feature selection in SVM with PsePSSM feature representation
Recommendations
Sequence-based prediction of protein-binding sites in DNA
As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has ...
Prediction of the disulphide bridges in proteins using SVM
Disulphide bonds link distant portions of protein chains and provide strong structural constraints in the form of long-range interactions. Prediction and knowledge of disulphide bond connectivity is important in reducing the search space of protein ...
Protein-DNA Binding Residue Prediction via Bagging Strategy and Sequence-Based Cube-Format Feature
Protein-DNA interactions play an important role in diverse biological processes. Accurately identifying protein-DNA binding residues is a critical but challenging task for protein function annotations and drug design. Although wet-lab experimental methods ...
Comments