research-article

Prediction of DNA binding protein using FC feature selection in SVM with PsePSSM feature representation

Author:
Achmad Ridok

Brawijaya University, Malang, Indonesia

Brawijaya University, Malang, Indonesia
View Profile

SIET '20: Proceedings of the 5th International Conference on Sustainable Information Engineering and TechnologyNovember 2020Pages 35–39https://doi.org/10.1145/3427423.3427462

Published:28 December 2020Publication History

SIET '20: Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology

Pages 35–39

ABSTRACT

DNA binding protein (DBP) plays an important role in various biological processes including DNA replication, recombination, and repair. Because of its important role in various biological activities, identification of DBP is a challenge to continue to be developed. DPB identification was initially carried out by the experimental method. However, this method is expensive and takes a lot of time. For this reason, in the last decades machine-based learning methods have been developed. Although several machine learning-based prediction methods have been developed. Research in this field is still open to continuously improving its performance. One of the efforts to improve the prediction performance of DBP is by selecting the appropriate feature vector extraction algorithm from amino acid sequences. In this paper we have used PsePSSM as feature representation and SVM with the RBF kernel combined with FC feature selection as a predictive model. Determination of the best performance is facilitated by evaluating the parameters of PsePSSM, SVM and FC. The results of the evaluation of the best performance parameters achieved an accuracy of 79.45% and AUC of 79.6%.

References

R. E. Langlois and H. Lu, "Boosting the prediction and understanding of DNA-binding domains from sequence," Nucleic Acids Res., vol. 38, no. 10, pp. 3149--3158, 2010.Google ScholarCross Ref
F. Cajone, M. Salina, and A. Benelli-Zazzera, "4-Hydroxynonenal induces a DNA-binding protein similar to the heat-shock factor," Biochem. J., vol. 262, no. 3, pp. 977--979, Sep. 1989.Google ScholarCross Ref
M. J. Buck and J. D. Lieb, "ChIP-chip: Considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments," Genomics, vol. 83, no. 3. Academic Press, pp. 349--360, 01-Mar-2004.Google Scholar
C. C. Chou, T. W. Lin, C. Y. Chen, and A. H. J. H.-J. Wang, "Crystal structure of the hyperthermophilic archaeal DNA-binding protein Sso10b2 at a resolution of 1.85 Angstroms," J. Bacteriol., vol. 185, no. 14, pp. 4066--4073, Jul. 2003.Google ScholarCross Ref
W. Lou, X. Wang, F. Chen, Y. Chen, B. Jiang, and H. Zhang, "Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naive Bayes," PLoS One, vol. 9, no. 1, p. e86703, Jan. 2014.Google ScholarCross Ref
M. M. Gromiha and R. Nagarajan, "Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes," in Advances in Protein Chemistry and Structural Biology, vol. 91, Academic Press, 2013, pp. 65--99.Google Scholar
K. Pröpper et al., "Structure solution of DNA-binding proteins and complexes with ARCIMBOLDO libraries," Acta Crystallogr. Sect. D Biol. Crystallogr., vol. 70, no. 6, pp. 1743--1757, 2014.Google ScholarCross Ref
H. Zhao, J. Wang, Y. Zhou, and Y. Yang, "Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome," PLoS One, vol. 9, no. 5, pp. 26--28, 2014.Google ScholarCross Ref
W. Z. Lin, J. A. Fang, X. Xiao, and K. C. Chou, "iDNA-prot: Identification of DNA binding proteins using random forest with grey model," PLoS One, vol. 6, no. 9, p. 24756, 2011.Google Scholar
M. Kumar, M. M. Gromiha, and G. P. Raghava, "Identification of DNA-binding proteins using support vector machines and evolutionary profiles," BMC Bioinformatics, vol. 8, no. 1, p. 463, Nov. 2007.Google ScholarCross Ref
Y. D. Cai and S. L. Lin, "Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence," vol. 1648, no. 1--2, pp. 127--133, May 2003.Google Scholar
B. Liu, J. Xu, S. Fan, R. Xu, J. Zhou, and X. Wang, "PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation," Mol. Inform., vol. 34, no. 1, pp. 8--17, Jan. 2015.Google ScholarCross Ref
Y. Fang, Y. Guo, Y. Feng, and M. Li, "Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features," Amino Acids, vol. 34, no. 1, pp. 103--109, Jan. 2008.Google ScholarCross Ref
L. Nanni, S. Brahnam, and A. Lumini, "High performance set of PseAAC and sequence based descriptors for protein classification," J. Theor. Biol., vol. 266, no. 1, pp. 1--10, Sep. 2010.Google ScholarCross Ref
X. Yu, J. Cao, Y. Cai, T. Shi, and Y. Li, "Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines," J. Theor. Biol., vol. 240, no. 2, pp. 175--184, May 2006.Google ScholarCross Ref
S. F. Altschul et al., "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Oxford University Press, 1997.Google Scholar
S.-Y. Y. Ho, F.-C. C. Yu, C.-Y. Y. Chang, and H.-L. L. Huang, "Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method," Biosystems, vol. 90, no. 1, pp. 234--241, Jul. 2007.Google ScholarCross Ref
B. Liu, S. Wang, and X. Wang, "DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation," Sci. Rep., vol. 5, Oct. 2015.Google Scholar
C. Shen, Y. Ding, J. Tang, J. Song, and F. Guo, "Identification of DNA-protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information," Molecules, vol. 22, no. 12, pp. 1--20, Nov. 2017.Google ScholarCross Ref
A. Ridok, N. Widodo, W. F. Mahmudy, and M. Rifai, "FC-SVM: DNA binding Proteins prediction with Average Blocks (AB) descriptors using SVM with FC feature Selection," in Proceedings of 2019 4th International Conference on Sustainable Information Engineering and Technology, SIET 2019, 2019, pp. 22--27.Google Scholar
R. Xu, J. Zhou, H. Wang, Y. He, X. Wang, and B. Liu, "Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation," BMC Syst. Biol., vol. 9, no. 1, p. S10, 2015.Google ScholarCross Ref
W. Lu et al., "Use Chou's 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information," Biomed Res. Int., vol. 2020, pp. 1--9, 2020.Google ScholarCross Ref
J. Hu, X. Zhou, Y.-H. Zhu, D.-J. Yu, and G. Zhang, "TargetDBP: Accurate DNA-Binding Protein Prediction via Sequence-based Multi-View Feature Learning," IEEE/ACM Trans. Comput. Biol. Bioinforma., pp. 1--1, 2019.Google Scholar
H. Tjong, H.-X. X. Zhou, and H. Tjong, "DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces," Nucleic Acids Res., vol. 35, no. 5, pp. 1465--1477, 2007.Google ScholarCross Ref
L. Wang, C. Huang, M. Q. Yang, and J. Y. Yang, "BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features," BMC Syst. Biol., vol. 4, no. SUPPL. 1, p. S3, May 2010.Google ScholarCross Ref
C. Zou, J. Gong, and H. Li, "An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis," BMC Bioinformatics, vol. 14, no. 1, p. 90, Mar. 2013.Google ScholarCross Ref
S. Ahmad and A. Sarai, "PSSM-based prediction of DNA binding sites in proteins," BMC Bioinformatics, vol. 6, no. 1, p. 33, Feb. 2005.Google ScholarCross Ref
M. Andrabi, K. Mizuguchi, A. Sarai, and S. Ahmad, "Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks," BMC Struct. Biol., vol. 9, no. 1, p. 30, May 2009.Google ScholarCross Ref
W.-Z. W.-Z. W.-Z. Lin, J.-A. J.-A. Fang, X. X. Chou, X. Xiao, and K.-C. Chou, "iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model," PLoS One, vol. 6, no. 9, p. 24756, Sep. 2011.Google ScholarCross Ref
G. Nimrod, M. Schushan, A. Szilágyi, C. Leslie, and N. Ben-Tal, "iDBPs: A web server for the identification of DNA binding proteins," Bioinformatics, vol. 26, no. 5, pp. 692--693, 2010. Google ScholarDigital Library
L. Wang, M. Q. Yang, and J. Y. Yang, "Prediction of DNA-binding residues from protein sequence information using random forests," BMC Genomics, vol. 10, no. SUPPL. 1, p. S1, Jul. 2009.Google ScholarCross Ref
C. Yan, M. Terribilini, F. Wu, R. L. Jernigan, D. Dobbs, and V. Honavar, "Predicting DNA-binding sites of proteins from amino acid sequence," BMC Bioinformatics, vol. 7, no. 1, p. 262, May 2006.Google ScholarCross Ref
Z. Qian, Y. D. Cai, and Y. Li, "A novel computational method to predict transcription factor DNA binding preference," Biochem. Biophys. Res. Commun., vol. 348, no. 3, pp. 1034--1037, Sep. 2006.Google ScholarCross Ref
B. Liu et al., "IDNA-Prot|dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition," PLoS One, vol. 9, no. 9, 2014.Google Scholar
G. Wang and R. L. Dunbrack, "PISCES: Recent improvements to a PDB sequence culling server," Nucleic Acids Res., vol. 33, no. SUPPL. 2, 2005.Google ScholarCross Ref
B. Boeckmann et al., "The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003," Nucleic Acids Research, vol. 31, no. 1. pp. 365--370, 2003.Google ScholarCross Ref
L. Nanni, A. Lumini, and S. Brahnam, "An empirical study of different approaches for protein classification," Sci. World J., vol. 2014, Jun. 2014.Google ScholarCross Ref
C. C. Chang and C. J. Lin, "LIBSVM: A Library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1--39, 2011. Google ScholarDigital Library
F. Idrees, M. Rajarajan, M. Conti, T. M. Chen, and Y. Rahulamathavan, "PIndroid: A novel Android malware detection system using ensemble learning methods," Comput. Secur., vol. 68, pp. 36--46, 2017. Google ScholarDigital Library

Index Terms

Prediction of DNA binding protein using FC feature selection in SVM with PsePSSM feature representation
1. Applied computing
  1. Life and medical sciences
    1. Computational biology

Recommendations

Sequence-based prediction of protein-binding sites in DNA

As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has ...
Read More
Prediction of the disulphide bridges in proteins using SVM

Disulphide bonds link distant portions of protein chains and provide strong structural constraints in the form of long-range interactions. Prediction and knowledge of disulphide bond connectivity is important in reducing the search space of protein ...
Read More
Protein-DNA Binding Residue Prediction via Bagging Strategy and Sequence-Based Cube-Format Feature
Protein-DNA interactions play an important role in diverse biological processes. Accurately identifying protein-DNA binding residues is a critical but challenging task for protein function annotations and drug design. Although wet-lab experimental methods ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIET '20: Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology
November 2020
277 pages
ISBN:9781450376051
DOI:10.1145/3427423
General Chairs:
Agung Setia Budi
Universitas Brawijaya, Indonesia
,
Sigit Adinugroho
Universitas Brawijaya, Indonesia
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 December 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DNA binding protein
FC feature selection
PsePSSM
RBF
SVM
Qualifiers
- research-article
Conference

Acceptance Rates
SIET '20 Paper Acceptance Rate45of57submissions,79%Overall Acceptance Rate45of57submissions,79%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 30
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Prediction of DNA binding protein using FC feature selection in SVM with PsePSSM feature representation

SIET '20: Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sequence-based prediction of protein-binding sites in DNA

Prediction of the disulphide bridges in proteins using SVM

Protein-DNA Binding Residue Prediction via Bagging Strategy and Sequence-Based Cube-Format Feature