Identification of protein-nucleotide binding residues via graph regularized k-local hyperplane distance nearest neighbor model

Ding, Yijie; Yang, Chao; Tang, Jijun; Guo, Fei

doi:10.1007/s10489-021-02737-0

Identification of protein-nucleotide binding residues via graph regularized k-local hyperplane distance nearest neighbor model

Published: 14 September 2021

Volume 52, pages 6598–6612, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yijie Ding¹,
Chao Yang²,
Jijun Tang^2,3 &
…
Fei Guo⁴

362 Accesses
11 Citations
Explore all metrics

Abstract

Accurate identification of protein-nucleotide binding residues is crucial for the study of drug structure and protein functional annotation. The study of protein-nucleotide binding residues is a typical problem of sample imbalance. The minority class (binding residues) are far less than the majority class (non-binding residues). The traditional machine learning algorithm is not universal for this kind of research, the results will be seriously biased to majority class. To deal with the serious imbalance problem, we propose a new computational method to identify protein-nucleotide binding residues via Graph Regularized k-local Hyperplane Distance Nearest Neighbor (GHKNN). On the training set, we compare the performance of the basic classifier, the ensemble classifier and the single classifier. On the independent test sets, we compare the performance with other existing models. The experimental results prove that our proposed method has higher accuracy in the identification of protein-nucleotide binding residues and is more prominent than other existing models. The data and material are freely available at https://github.com/guofei-tju/GHKNN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of acid radical ion binding residues by K-nearest neighbors classifier

Article Open access 11 December 2019

Prediction of Protein–Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures

Article 12 November 2015

Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information

Article 17 October 2020

References

Gao M, Skolnick J (2012) The distribution of ligand-binding pockets around protein-protein interfaces suggests a general mechanism for pocket formation. Proc Natl Acad USA 109(10):3784–3789
Article Google Scholar
Kokubo H, Tanaka T, Okamoto Y (2011) Ab initio prediction of protein-ligand binding structures by replica-exchange umbrella sampling simulations. J Comput Chem 32(13):2810–2821
Article Google Scholar
Rose PW, Andreas P, Chunxiao B, Bluhm WF, et al. (2015) The rcsb protein data bank: views of structural biology for basic and applied research and education. Nuclc Acids Res 43(D1):345–56
Article Google Scholar
Ding YJ, Tang JJ, Guo F (2020) Identification of drug–target interactions via fuzzy bipartite local model. Neural Comput Applic 32:10303–10319
Article Google Scholar
Ding YJ, Tang JJ, Guo F (2020) Identification of drug-target interactions via dual laplacian regularized least squares with multiple kernel fusion. Knowl-Based Syst 204:106254
Article Google Scholar
Ding YJ, Tang JJ, Guo F (2021) Identification of drug-target interactions via multi-view graph regularized link propagation model. Neurocomputing, page https://doi.org/10.1016/j.neucom.2021.05.100
Wang H, Ding YJ, Tang JJ, Guo F (2020) Identification of membrane protein types via multivariate information fusion with hilbert–schmidt independence criterion. Neurocomputing 383:257–269
Article Google Scholar
Shen YN, Tang JJ, Guo F (2019) Identification of protein subcellular localization via integrating evolutionary and physicochemical information into chou’s general pseaac. Journal of Theoretical Biology 462:230–239
Article MATH Google Scholar
Ding YJ, Tang JJ, Guo F (2020) Human protein subcellular localization identification via fuzzy model on kernelized neighborhood representation. Appl Soft Comput 96:106596
Article Google Scholar
Ding YJ, Tang JJ, Guo F (2019) Protein crystallization identification via fuzzy model on linear neighborhood representation. IEEE/ACM Transactions on Computational Biology and Bioinformatics, page https://doi.org/10.1109/TCBB.2019.2954826.
Lin H, Liang Z-Y, Tang H, Chen W (2019) Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16(4):1316–1321
Article Google Scholar
Lin H, Deng E-Z, Ding H, Chen W, Chou K-C (2014) ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Molecular BioSystems 42(21):961–972
Google Scholar
Chen W, Yang H, Feng P, Ding H, Lin H (2017) iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics 33(22):3518–3523
Article Google Scholar
Tal P, Bell RE, Itay M, Fabian G, Nir BT (2002) Rate4site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics, (18), pp S71–s77
Aharon A, Dan G, Nir BT (2001) Consurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 307(1):447–463
Article Google Scholar
Yu DJ, Hu J, Huang Y, et al. (2013) Targetatpsite: a template-free method for atp-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J Comput Chem 34(11):974–985
Article Google Scholar
Ding YJ, Tang JJ, Guo F (2017) Identification of protein–ligand binding sites by sequence information and ensemble classifier. J Chem Inf Model 57(12):3149–3161
Article Google Scholar
Zhao Z, Xu Y, Zhao Y (2019) SXGBsite: prediction of protein-ligand binding sites using sequence information and extreme gradient boosting. Genes 10(12):965
Article Google Scholar
Hu J, Rao L, Fan X (2020) Identification of ligand-binding residues using protein sequence profile alignment and query-specific support vector machine model. Anal Biochem 604:113799
Article Google Scholar
Song J, Liu G, Jiang J (2021) Prediction of protein–ATP binding residues based on ensemble of deep convolutional neural networks and lightGBM algorithm. Int J Mol Sci 22(2):939
Article Google Scholar
Hendlich M (1997) Ligsite: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model 15:359–363
Article Google Scholar
Dundas J, Ouyang Z, Tseng J, Binkowski T, Turpaz Y, Liang J (2006) Castp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res 34:116–118
Article Google Scholar
Levitt DG, Banaszak LJ (1992) Pocket: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J Mol Graph 10(4):229–234
Article Google Scholar
Laskowski RA (1995) Surfnet: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 13(5):323–330
Article Google Scholar
Laurie ATR, Jackson MR (2005) Q-sitefinder: an energy-based method for the prediction of protein–ligand binding sites. Bioinformatics 21(9):1908–1916
Article Google Scholar
Hernandez M, Ghersi D, Sanchez R (2009) Sitehound-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res 37(2):413–416
Article Google Scholar
Hoffmann B, Zaslavskiy M, Vert JP, Stoven V (2010) A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3d: application to ligand prediction. Bmc Bioinformatics 11 (1):1–16
Article Google Scholar
Yu DJ, Hu J, Tang ZM, et al. (2013) Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 104:180–190
Article Google Scholar
Chen K, Mizianty MJ, Kurgan L (2011) Atpsite: sequence-based prediction of atp-binding residues. Proteome Sci 9(1):1–8
Google Scholar
Chen K, Marcin JM, Lukasz K (2012) Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 28(3):331–41
Article Google Scholar
Yu DJ, Hu J, Huang Y, et al. (2013) Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10(4):994–1008
Article Google Scholar
Yang JY, Ambrish R, Zhang Y (2013) Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29(20):2588–2595
Article Google Scholar
Huang B, Schroeder M (2006) Ligsitecsc: predicting ligand binding sites using the connolly surface and degree of conservation. Bmc Structural Biology 6(1):19–19
Article Google Scholar
Glaser F, Morris RJ, Najmanovich RJ et al (2010) A method for localizing ligand binding pockets in protein structures. Proteins-structure Function and Bioinformatics 62(2):479–488
Article Google Scholar
Hu J, Yang L, Yang Z, Yu DJ (2018) ATPBind: accurate protein-ATP binding site prediction by combining sequence-profiling and structure-based comparisons. J Chem Inform Model 58(2):501–510
Article Google Scholar
Ahmed NN, Natarajan T, Rao KR (2006) Discrete cosine transform. IEEE Trans Comput C-23(1):90–93
Article MathSciNet MATH Google Scholar
Loris N, Alessandra L, Sheryl B (2014) An empirical study of different approaches for protein classification. Sci World J 2014:236717
Google Scholar
Vincent P, Bengio Y (2002) K-local hyperplane and convex distance nearest neighbor algorithms. Adv Neural Inform Process Syst 14:985–992
Google Scholar
Yang JY, Roy A, Zhang Y (2013) Biolip: a semi-manually curated database for biologically relevant ligandprotein interactions. Nuclc Acids Res 41(D1):1096–1103
Article Google Scholar
Altschul SF, Madden TL, Schäffer AA, Zhang JH, Lipman DJ (1997) Gapped blast and psi-blast: a new generation of protein databases search programs. Nucleic Acids Res 25(17):3389–3402
Article Google Scholar
Shandar A, Michael G, Akinori S (2010) Real value prediction of solvent accessibility from amino acid sequence. Proteins-structure Function and Bioinformatics 50(4):629–635
Google Scholar
Joo K, Lee SJ, Lee J (2012) Sann: solvent accessibility prediction of proteins by nearest neighbor method. Proteins-structure Function and Bioinformatics 80(7):1791–1797
Article Google Scholar
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Article Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13 (1):21–27
Article MATH Google Scholar
Leo B (2001) Random forests. Machine Learn 45(1):5–32
Article MATH Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29 (5):1189–1232
Article MathSciNet MATH Google Scholar
Dua D, Graff C (2017) UCI machine learning repository

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC 61902271, 61772362 and 61972280), the Natural Science Research of Jiangsu Higher Education Institutions of China (19KJB520014) and the National Key R&D Program of China (2020YFA0908400).

The author would like to thank Professor Dong-jun Yu for providing the dataset, which helped improve the quality of this paper.

Author information

Authors and Affiliations

Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
Yijie Ding
School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
Chao Yang & Jijun Tang
Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29208, USA
Jijun Tang
School of Computer Science and Engineering, Central South University, Changsha, 410083, China
Fei Guo

Authors

Yijie Ding
View author publications
You can also search for this author in PubMed Google Scholar
Chao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jijun Tang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yijie Ding or Fei Guo.

Ethics declarations

Conflict of Interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yijie Ding and Chao Yang have the same contribution, they are joint first authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, Y., Yang, C., Tang, J. et al. Identification of protein-nucleotide binding residues via graph regularized k-local hyperplane distance nearest neighbor model. Appl Intell 52, 6598–6612 (2022). https://doi.org/10.1007/s10489-021-02737-0

Download citation

Accepted: 29 July 2021
Published: 14 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10489-021-02737-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification of protein-nucleotide binding residues via graph regularized k-local hyperplane distance nearest neighbor model

Abstract

Access this article

Similar content being viewed by others

Prediction of acid radical ion binding residues by K-nearest neighbors classifier

Prediction of Protein–Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures

Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identification of protein-nucleotide binding residues via graph regularized k-local hyperplane distance nearest neighbor model

Abstract

Access this article

Similar content being viewed by others

Prediction of acid radical ion binding residues by K-nearest neighbors classifier

Prediction of Protein–Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures

Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation