Skip to main content
Log in

Identification of hormone-binding proteins using a novel ensemble classifier

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Hormone-binding proteins (HBPs) are important soluble carriers for growth hormones, and correct recognition of HBPs is crucial to understanding their functions. Therefore, we aimed to construct an efficient and reliable classifier to identify HBPs accurately. At first, 246 proteins were collected from UniProt database and considered as the objective benchmark dataset. We employed the 8000-dimensional feature extraction method based on tripeptide compositions to formulate protein samples. Subsequently, we alleviated the intricate feature set by utilizing ANOVA, a feature ranking technique, and acquired the optimal feature subset devoid of redundant information. Furthermore, we utilized three classification methods to process the selected tripeptide features, which generated three probability sequences. Finally, the three probability sequences were considered as new features, and addressed by the support vector machine to construct a prediction model. Results indicated that 90.6% of accuracy was achieved in five-fold cross validation, which was superior to that of other published methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Baumann G (2002) Growth hormone binding protein. The soluble growth hormone receptor. Minerva Endocrinol 27(4):265–276

    Google Scholar 

  2. Dhiraviam KN, Balasubramanian S, Jayavel S (2018) Indole alkaloids as new leads for the design and development of novel DPP-IV inhibitors for the treatment of diabetes. Curr Bioinform 13(2):157–169

    Article  Google Scholar 

  3. Ozzola G (2016) Essay of sex hormone binding protein in internal medicine: a brief review. La Clinica Terapeutica 167(5):e127–e129

    Google Scholar 

  4. Kraut JA, Madias NE (2017) Adverse Effects of the Metabolic Acidosis of Chronic Kidney Disease. Adv Chron Kidney Dis 24(5):289–297

    Article  Google Scholar 

  5. Yang X-G, Luo R-Y, Feng Z-P (2007) Using amino acid and peptide composition to predict membrane protein types. Biochem Biophys Res Commun 353(1):164–169

    Article  Google Scholar 

  6. Liu B et al (2015) Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucle Acids Res 43(W1):W65–W71

    Article  Google Scholar 

  7. Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct Funct Bioinf 43(3):246–255

    Article  Google Scholar 

  8. Liu B et al (2017) Pse-Analysis: a python package for DNA, RNA and protein peptide sequence analysis based on pseudo components and kernel methods. Oncotarget 8(8):13338–13343

    Google Scholar 

  9. Zhang Z-H et al (2006) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580(26):6169–6174

    Article  Google Scholar 

  10. Liu B et al (2014) iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 9(9):e106691

    Article  Google Scholar 

  11. Chen Y et al (2003) Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT. Mamm Genome 14(12):859–865

    Article  Google Scholar 

  12. Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24(1):34–35

    Article  Google Scholar 

  13. Cai Y-D et al (2004) Application of SVM to predict membrane protein types. J Theor Biol 226(4):373–376

    Article  MathSciNet  Google Scholar 

  14. Liu H et al (2005) Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types. Protein J 24(6):385–389

    Article  Google Scholar 

  15. Liu B et al (2014) Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30(4):472–479

    Article  Google Scholar 

  16. Xiao Y, Zhang J, Deng L (2017) Prediction of lncRNA-protein interactions using HeteSim scores based on heterogeneous networks. Sci Rep 7(1):3664

    Article  Google Scholar 

  17. Cai Y-D, Liu X-J, Chou K-C (2001) Artificial neural network model for predicting membrane protein types. J Biomol Struct Dyn 18(4):607–610

    Article  Google Scholar 

  18. Rezaei MA et al (2008) Prediction of membrane protein types by means of wavelet analysis and cascaded neural networks. J Theor Biol 254(4):817–820

    Article  Google Scholar 

  19. Zhang J et al (2018) Ontological function annotation of long non-coding RNAs through hierarchical multi-label classification. Bioinformatics 34(10):1750–1757

    Article  Google Scholar 

  20. Shen H, Chou K-C (2005) Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. Biochem Biophys Res Commun 334(1):288–292

    Article  Google Scholar 

  21. Tanchotsrinon W, Lursinsap C, Poovorawan Y (2017) An efficient prediction of HPV genotypes from partial coding sequences by Chaos game representation and fuzzy k-nearest neighbor technique. Curr Bioinform 12(5):431–440

    Article  Google Scholar 

  22. Bulashevska A, Eils R (2006) Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains. BMC Bioinform 7(1):298

    Article  Google Scholar 

  23. Scott MS, Thomas DY, Hallett MT (2004) Predicting subcellular localization via protein motif co-occurrence. Genome Res 14(10a):1957–1966

    Article  Google Scholar 

  24. Deng L, Chen Z (2015) An integrated framework for functional annotation of protein structural domains. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 12(4):902–913

    Article  Google Scholar 

  25. Zeng C, Zhan W, Deng L (2018) SDADB: a functional annotation database of protein structural domains. Database 2018:bay064

    Article  Google Scholar 

  26. Chen Y-L, Li Q-Z (2007) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245(4):775–783

    Article  MathSciNet  Google Scholar 

  27. Chou K-C, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12(2):107–118

    Article  Google Scholar 

  28. Lin C et al (2013) Hierarchical classification of protein folds using a novel ensemble classifier. PLoS One 8(2):e56499

    Article  Google Scholar 

  29. Song L et al (2014) nDNA-prot: identification of DNA-binding proteins based on unbalanced classification. BMC Bioinform 15(1):298

    Article  Google Scholar 

  30. Zou Q et al (2013) BinMemPredict: a web server and software for predicting membrane protein types. Curr Proteomics 10(1):2–9

    Article  MathSciNet  Google Scholar 

  31. Wan S, Duan Y, Zou Q (2017) HPSLPred: an ensemble multi-label classifier for human protein subcellular location prediction with imbalanced source. Proteomics 17:1700262

    Article  Google Scholar 

  32. Zou Q et al (2015) Improving tRNAscan-SE Annotation Results via Ensemble Classifiers. Mol Inform 34(11–12):761–770

    Article  Google Scholar 

  33. Liu B et al (2017) iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics 33(1):35–41

    Article  Google Scholar 

  34. Liu B et al (2018) iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty458

    Article  Google Scholar 

  35. Liu B, Yang F, Chou K-C (2017) 2L-piRNA: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function. Mol Ther Nucl 7:267–277

    Article  Google Scholar 

  36. Pan Y, Wang Z, Zhan W, Deng L (2018) Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics 34(9):1473–1480

    Article  Google Scholar 

  37. Long HX, Wang M, Fu HY (2017) Deep convolutional neural networks for predicting hydroxyproline in proteins. Curr Bioinform 12(3):233–238

    Article  Google Scholar 

  38. Wei L et al (2018) Prediction of human protein subcellular localization using deep learning. J Parallel Distrib Comput 117:212–217

    Article  Google Scholar 

  39. Wei L et al (2019) Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites. Neurocomputing 324:3–9. https://doi.org/10.1016/j.neucom.2018.04.082

    Article  Google Scholar 

  40. Li S, Chen J, Liu B (2017) Protein remote homology detection based on bidirectional long short-term memory. BMC Bioinform 18:443

    Article  Google Scholar 

  41. Tang H et al (2018) HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci 14(8):957–964

    Article  Google Scholar 

  42. Niu M et al (2018) RFAmyloid: a web server for predicting amyloid proteins. Int J Mol Sci 19(7):2071

    Article  Google Scholar 

  43. Ding C et al (2012) Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteomics 77:321–328

    Article  Google Scholar 

  44. Zhu P-P et al (2015) Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol BioSyst 11(2):558–563

    Article  Google Scholar 

  45. Ding H et al (2014) Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol BioSyst 10(8):2229–2235

    Article  Google Scholar 

  46. Tang H, Chen W, Lin H (2016) Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol BioSyst 12(4):1269–1275

    Article  Google Scholar 

  47. Zhao Y-W et al (2017) IonchanPred 2.0: a tool to predict ion channels and their types. Int J Mol Sci 18(9):1838

    Article  Google Scholar 

  48. Holmes G, Donkin A, Witten IH (1994) Weka: A machine learning workbench. In: Proceedings of the 2nd Australian and New Zealand conference on intelligent information systems, 1994. IEEE

  49. Tang H et al (2016) Identification of apolipoprotein using feature selection technique. Sci Rep 6:30441

    Article  Google Scholar 

  50. Liu B et al (2015) Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One 10(3):e0121501

    Article  Google Scholar 

  51. Chen J et al (2018) A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief Bioinform 9(2):231–244

    Article  MathSciNet  Google Scholar 

  52. Wei L et al (2018) ACPred-FL: a sequence-based predictor based on effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 34:4007–4016

    Google Scholar 

  53. Fan GL et al (2015) DSPMP: discriminating secretory proteins of malaria parasite by hybridizing different descriptors of C hou’s pseudo amino acid patterns. J Comput Chem 36(31):2317–2327

    Article  Google Scholar 

  54. Gautam A et al (2012) CPPsite: a curated database of cell penetrating peptides. Database 2012

  55. Guo S-H et al (2014) iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11):1522–1529

    Article  Google Scholar 

  56. Lin H et al (2014) iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucl Acids Res 42(21):12961–12972

    Article  Google Scholar 

  57. Suratanee A, Plaimas K (2014) Identification of inflammatory bowel disease-related proteins using a reverse k-nearest neighbor search. J Bioinform Comput Biol 12(04):1450017

    Article  Google Scholar 

  58. Wei L, Tang J, Zou Q (2017) Local-DPP: an improved DNA-binding protein prediction method by exploring local evolutionary information. Inf Sci 384:135–144

    Article  Google Scholar 

  59. Zhang J, Liu B (2017) PSFM-DBT: identifying DNA-binding proteins by combing position specific frequency matrix and distance-bigram transformation. Int J Mol Sci 18:1856

    Article  Google Scholar 

  60. Liu Y, Wang X, Liu B (2017) A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Brief Bioinform. https://doi.org/10.1093/bib/bbx126

    Article  Google Scholar 

  61. Su R et al (2018) Developing a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2018.2858756

  62. Liu B et al (2018) iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC. Bioinformatics 1:8. https://doi.org/10.1093/bioinformatics/bty312

    Article  Google Scholar 

  63. Tang H et al (2016) Prediction of cell-penetrating peptides with feature selection techniques. Biochem Biophys Res Commun 477(1):150–154

    Article  Google Scholar 

  64. Wei L et al (2018) Prediction of human protein subcellular localization using deep learning. J Parallel Distrib Comput 117:212–217

    Article  Google Scholar 

  65. Wei L et al (2017) Fast prediction of protein methylation sites using a sequence-based feature selection technique. IEEE/ACM Trans Comput Biol Bioinf 1:1

    Google Scholar 

  66. Tang W, Liao Z, Zou Q (2016) Which statistical significance test best detects oncomiRNAs in cancer tissues? An exploratory analysis. Oncotarget 7:85613. https://doi.org/10.18632/oncotarget.12828

    Article  Google Scholar 

  67. Lin C et al (2014) LibD3C: ensemble classifiers with a clustering and dynamic selection strategy. Neurocomputing 123:424–435

    Article  Google Scholar 

  68. Liu B (2017) BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform. https://doi.org/10.1093/bib/bbx165

  69. Zou Q et al (2014) Survey of MapReduce frame operation in bioinformatics. Brief Bioinform 15(4):637–647

    Article  Google Scholar 

  70. Su W et al (2017) Multiple sequence alignment based on a suffix tree and center-star strategy: a linear method for multiple nucleotide sequence alignment on spark parallel framework. J Comput Biol 24(12):1230–1242

    Article  MathSciNet  Google Scholar 

  71. Guo R et al (2018) Bioinformatics applications on Apache Spark. GigaScience 7(8):giy098

    Google Scholar 

Download references

Acknowledgements

The work was supported by the National Key R&D Program of China (SQ2018YFC090002), and the Natural Science Foundation of China (Nos. 61771331, 61871282).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, K., Li, S., Wang, Q. et al. Identification of hormone-binding proteins using a novel ensemble classifier. Computing 101, 693–703 (2019). https://doi.org/10.1007/s00607-018-0682-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-018-0682-x

Keywords

Mathematics Subject Classification

Navigation