skip to main content
10.1145/3427423.3427462acmotherconferencesArticle/Chapter ViewAbstractPublication PagessietConference Proceedingsconference-collections
research-article

Prediction of DNA binding protein using FC feature selection in SVM with PsePSSM feature representation

Published:28 December 2020Publication History

ABSTRACT

DNA binding protein (DBP) plays an important role in various biological processes including DNA replication, recombination, and repair. Because of its important role in various biological activities, identification of DBP is a challenge to continue to be developed. DPB identification was initially carried out by the experimental method. However, this method is expensive and takes a lot of time. For this reason, in the last decades machine-based learning methods have been developed. Although several machine learning-based prediction methods have been developed. Research in this field is still open to continuously improving its performance. One of the efforts to improve the prediction performance of DBP is by selecting the appropriate feature vector extraction algorithm from amino acid sequences. In this paper we have used PsePSSM as feature representation and SVM with the RBF kernel combined with FC feature selection as a predictive model. Determination of the best performance is facilitated by evaluating the parameters of PsePSSM, SVM and FC. The results of the evaluation of the best performance parameters achieved an accuracy of 79.45% and AUC of 79.6%.

References

  1. R. E. Langlois and H. Lu, "Boosting the prediction and understanding of DNA-binding domains from sequence," Nucleic Acids Res., vol. 38, no. 10, pp. 3149--3158, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  2. F. Cajone, M. Salina, and A. Benelli-Zazzera, "4-Hydroxynonenal induces a DNA-binding protein similar to the heat-shock factor," Biochem. J., vol. 262, no. 3, pp. 977--979, Sep. 1989.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. J. Buck and J. D. Lieb, "ChIP-chip: Considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments," Genomics, vol. 83, no. 3. Academic Press, pp. 349--360, 01-Mar-2004.Google ScholarGoogle Scholar
  4. C. C. Chou, T. W. Lin, C. Y. Chen, and A. H. J. H.-J. Wang, "Crystal structure of the hyperthermophilic archaeal DNA-binding protein Sso10b2 at a resolution of 1.85 Angstroms," J. Bacteriol., vol. 185, no. 14, pp. 4066--4073, Jul. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. W. Lou, X. Wang, F. Chen, Y. Chen, B. Jiang, and H. Zhang, "Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naive Bayes," PLoS One, vol. 9, no. 1, p. e86703, Jan. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. M. Gromiha and R. Nagarajan, "Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes," in Advances in Protein Chemistry and Structural Biology, vol. 91, Academic Press, 2013, pp. 65--99.Google ScholarGoogle Scholar
  7. K. Pröpper et al., "Structure solution of DNA-binding proteins and complexes with ARCIMBOLDO libraries," Acta Crystallogr. Sect. D Biol. Crystallogr., vol. 70, no. 6, pp. 1743--1757, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  8. H. Zhao, J. Wang, Y. Zhou, and Y. Yang, "Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome," PLoS One, vol. 9, no. 5, pp. 26--28, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  9. W. Z. Lin, J. A. Fang, X. Xiao, and K. C. Chou, "iDNA-prot: Identification of DNA binding proteins using random forest with grey model," PLoS One, vol. 6, no. 9, p. 24756, 2011.Google ScholarGoogle Scholar
  10. M. Kumar, M. M. Gromiha, and G. P. Raghava, "Identification of DNA-binding proteins using support vector machines and evolutionary profiles," BMC Bioinformatics, vol. 8, no. 1, p. 463, Nov. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. D. Cai and S. L. Lin, "Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence," vol. 1648, no. 1--2, pp. 127--133, May 2003.Google ScholarGoogle Scholar
  12. B. Liu, J. Xu, S. Fan, R. Xu, J. Zhou, and X. Wang, "PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation," Mol. Inform., vol. 34, no. 1, pp. 8--17, Jan. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  13. Y. Fang, Y. Guo, Y. Feng, and M. Li, "Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features," Amino Acids, vol. 34, no. 1, pp. 103--109, Jan. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  14. L. Nanni, S. Brahnam, and A. Lumini, "High performance set of PseAAC and sequence based descriptors for protein classification," J. Theor. Biol., vol. 266, no. 1, pp. 1--10, Sep. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. X. Yu, J. Cao, Y. Cai, T. Shi, and Y. Li, "Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines," J. Theor. Biol., vol. 240, no. 2, pp. 175--184, May 2006.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. F. Altschul et al., "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Oxford University Press, 1997.Google ScholarGoogle Scholar
  17. S.-Y. Y. Ho, F.-C. C. Yu, C.-Y. Y. Chang, and H.-L. L. Huang, "Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method," Biosystems, vol. 90, no. 1, pp. 234--241, Jul. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  18. B. Liu, S. Wang, and X. Wang, "DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation," Sci. Rep., vol. 5, Oct. 2015.Google ScholarGoogle Scholar
  19. C. Shen, Y. Ding, J. Tang, J. Song, and F. Guo, "Identification of DNA-protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information," Molecules, vol. 22, no. 12, pp. 1--20, Nov. 2017.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Ridok, N. Widodo, W. F. Mahmudy, and M. Rifai, "FC-SVM: DNA binding Proteins prediction with Average Blocks (AB) descriptors using SVM with FC feature Selection," in Proceedings of 2019 4th International Conference on Sustainable Information Engineering and Technology, SIET 2019, 2019, pp. 22--27.Google ScholarGoogle Scholar
  21. R. Xu, J. Zhou, H. Wang, Y. He, X. Wang, and B. Liu, "Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation," BMC Syst. Biol., vol. 9, no. 1, p. S10, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  22. W. Lu et al., "Use Chou's 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information," Biomed Res. Int., vol. 2020, pp. 1--9, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. Hu, X. Zhou, Y.-H. Zhu, D.-J. Yu, and G. Zhang, "TargetDBP: Accurate DNA-Binding Protein Prediction via Sequence-based Multi-View Feature Learning," IEEE/ACM Trans. Comput. Biol. Bioinforma., pp. 1--1, 2019.Google ScholarGoogle Scholar
  24. H. Tjong, H.-X. X. Zhou, and H. Tjong, "DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces," Nucleic Acids Res., vol. 35, no. 5, pp. 1465--1477, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  25. L. Wang, C. Huang, M. Q. Yang, and J. Y. Yang, "BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features," BMC Syst. Biol., vol. 4, no. SUPPL. 1, p. S3, May 2010.Google ScholarGoogle ScholarCross RefCross Ref
  26. C. Zou, J. Gong, and H. Li, "An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis," BMC Bioinformatics, vol. 14, no. 1, p. 90, Mar. 2013.Google ScholarGoogle ScholarCross RefCross Ref
  27. S. Ahmad and A. Sarai, "PSSM-based prediction of DNA binding sites in proteins," BMC Bioinformatics, vol. 6, no. 1, p. 33, Feb. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. Andrabi, K. Mizuguchi, A. Sarai, and S. Ahmad, "Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks," BMC Struct. Biol., vol. 9, no. 1, p. 30, May 2009.Google ScholarGoogle ScholarCross RefCross Ref
  29. W.-Z. W.-Z. W.-Z. Lin, J.-A. J.-A. Fang, X. X. Chou, X. Xiao, and K.-C. Chou, "iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model," PLoS One, vol. 6, no. 9, p. 24756, Sep. 2011.Google ScholarGoogle ScholarCross RefCross Ref
  30. G. Nimrod, M. Schushan, A. Szilágyi, C. Leslie, and N. Ben-Tal, "iDBPs: A web server for the identification of DNA binding proteins," Bioinformatics, vol. 26, no. 5, pp. 692--693, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Wang, M. Q. Yang, and J. Y. Yang, "Prediction of DNA-binding residues from protein sequence information using random forests," BMC Genomics, vol. 10, no. SUPPL. 1, p. S1, Jul. 2009.Google ScholarGoogle ScholarCross RefCross Ref
  32. C. Yan, M. Terribilini, F. Wu, R. L. Jernigan, D. Dobbs, and V. Honavar, "Predicting DNA-binding sites of proteins from amino acid sequence," BMC Bioinformatics, vol. 7, no. 1, p. 262, May 2006.Google ScholarGoogle ScholarCross RefCross Ref
  33. Z. Qian, Y. D. Cai, and Y. Li, "A novel computational method to predict transcription factor DNA binding preference," Biochem. Biophys. Res. Commun., vol. 348, no. 3, pp. 1034--1037, Sep. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  34. B. Liu et al., "IDNA-Prot|dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition," PLoS One, vol. 9, no. 9, 2014.Google ScholarGoogle Scholar
  35. G. Wang and R. L. Dunbrack, "PISCES: Recent improvements to a PDB sequence culling server," Nucleic Acids Res., vol. 33, no. SUPPL. 2, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  36. B. Boeckmann et al., "The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003," Nucleic Acids Research, vol. 31, no. 1. pp. 365--370, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  37. L. Nanni, A. Lumini, and S. Brahnam, "An empirical study of different approaches for protein classification," Sci. World J., vol. 2014, Jun. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  38. C. C. Chang and C. J. Lin, "LIBSVM: A Library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1--39, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Idrees, M. Rajarajan, M. Conti, T. M. Chen, and Y. Rahulamathavan, "PIndroid: A novel Android malware detection system using ensemble learning methods," Comput. Secur., vol. 68, pp. 36--46, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Prediction of DNA binding protein using FC feature selection in SVM with PsePSSM feature representation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SIET '20: Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology
      November 2020
      277 pages
      ISBN:9781450376051
      DOI:10.1145/3427423

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 December 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIET '20 Paper Acceptance Rate45of57submissions,79%Overall Acceptance Rate45of57submissions,79%
    • Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader