Skip to main content

Advertisement

Log in

An effective machine learning-based model for the prediction of protein–protein interaction sites in health systems

  • S.I.: Improving Healthcare outcomes using Multimedia Big Data Analytics
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Protein is a vital biomolecule that accomplishes distinct biological activities by interacting with other proteins in complex biological systems. The protein–protein interaction (PPI) sites hot spot characterization holds preliminary importance in drug discovery as well as in the comprehension of the cellular signaling phenomenon. Looking at the significance of PPIs, an intelligent prediction system based on the notion of fuzzy logic “PPIs-FuzzyKNN” is developed for PPI sites identification. Here, protein sequences are transformed into an equal length of numerical descriptors by using physicochemical properties of amino acids and a position-specific scoring matrix. Here, we have utilized conventional machine learning algorithms as well as fuzzy k-nearest neighbors. The results of the model are assessed via a tenfold cross-validation test. The proposed model PPIs-FuzzyKNN obtained 91.20, 92.65, and 93.50% of accuracy on the three different datasets, namely Dtestset72, PDBtestset164, and Dset186, respectively. The results exhibited that the outcomes of the proposed model are outstanding and persistent in all datasets, so far, compared to the literature. Consequently, it will not only play a leading role in the accurate identification of PPI sites but also becomes a rudimentary tool for the research community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Huart A-S, MacLaine NJ, Narayan V, Hupp TR (2012) Exploiting the MDM2-CK1α Protein-Protein Interface to Develop Novel Biologics That Induce UBL-Kinase-Modification and Inhibit Cell Growth. PloS one 7:e43391

    Article  Google Scholar 

  2. Wei L, Liao M, Gao X, Zou Q (2015) An improved protein structural classes prediction method by incorporating both sequence and structure information. IEEE Trans Nanobiosci 14:339–349

    Article  Google Scholar 

  3. Hwang H, Pierce B, Mintseris J, Janin J, Weng Z (2008) Protein–protein docking benchmark version 3.0, Proteins: structure. Funct Bioinf 73:705–709

    Article  Google Scholar 

  4. Sharma A, Lyons J, Dehzangi A, Paliwal KK (2013) A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J Theor Biol 320:41–46

    Article  MathSciNet  Google Scholar 

  5. Ghoorah AW, Devignes M-D, Smaïl-Tabbone M, Ritchie DW (2011) Spatial clustering of protein binding sites for template based protein docking. Bioinformatics 27:2820–2827

    Article  Google Scholar 

  6. Mignani S, El Kazzouli S, Bousmina MM, Majoral J-P (2014) Dendrimer space exploration: an assessment of dendrimers/dendritic scaffolding as inhibitors of protein–protein interactions, a potential new area of pharmaceutical development. Chem Rev 114:1327–1342

    Article  Google Scholar 

  7. Mørk S, Pletscher-Frankild S, Caro AP, Gorodkin J, Jensen LJ (2013) Protein-driven inference of miRNA–disease associations. Bioinformatics 30:392–397

    Article  Google Scholar 

  8. Rao VS, Srinivas K, Sujini G, Kumar G (2014) Protein-protein interaction detection: methods and analysis. Int J Proteom. https://doi.org/10.1155/2014/147648

    Article  Google Scholar 

  9. Jones S, Thornton JM (1997) Analysis of protein-protein interaction sites using surface patches. J Mol Biol 272:121–132

    Article  Google Scholar 

  10. Wei Z-S, Han K, Yang J-Y, Shen H-B, Yu D-J (2016) Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing 193:201–212

    Article  Google Scholar 

  11. Ofran Y, Rost B (2007) ISIS: interaction sites identified from sequence. Bioinformatics 23:e13–e16

    Article  Google Scholar 

  12. Porollo A, Meller J (2007) Prediction-based fingerprints of protein–protein interactions. Proteins: Struct, Function, Bioinf 66:630–645

    Article  Google Scholar 

  13. Murakami Y, Mizuguchi K (2010) Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics 26:1841–1848

    Article  Google Scholar 

  14. Singh G, Dhole K, Pai PP, Mondal S (2014) SPRINGS: prediction of protein-protein interaction sites using artificial neural networks, PeerJ PrePrints

  15. Dhole K, Singh G, Pai PP, Mondal S (2014) Sequence-based prediction of protein–protein interaction sites with L1-logreg classifier. J Theor Biol 348:47–54

    Article  Google Scholar 

  16. Chou K-C (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273:236–247

    Article  MathSciNet  Google Scholar 

  17. Liu G-H, Shen H-B, Yu D-J (2016) Prediction of protein-protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J Membr Biol 249:141–153

    Article  Google Scholar 

  18. Jia J, Liu Z, Xiao X, Liu B, Chou K-C (2015) iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. J Theor Biol 377:47–56

    Article  Google Scholar 

  19. Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct Function, Bioinf 43:246–255

    Article  Google Scholar 

  20. Hayat M, Khan A (2012) MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. J Theor Biol 292:93–102

    Article  MathSciNet  Google Scholar 

  21. Hayat M, Khan A (2013) WRF-TMH: predicting transmembrane helix by fusing composition index and physicochemical properties of amino acids. Amino Acids 44:1317–1328

    Article  Google Scholar 

  22. Chou K-C, Shen H-B (2007) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345

    Article  Google Scholar 

  23. Hayat M, Tahir M (2015) PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine. Mol BioSyst 11:2255–2262

    Article  Google Scholar 

  24. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202

    Article  Google Scholar 

  25. Yu D, Wu X, Shen H, Yang J, Tang Z, Qi Y, Yang J (2012) Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features. IEEE Trans Nanobiosci 11:375–385

    Article  Google Scholar 

  26. Yu D-J, Shen H-B, Yang J-Y (2012) SOMPNN: an efficient non-parametric model for predicting transmembrane helices. Amino Acids 42:2195–2205

    Article  Google Scholar 

  27. Yu D-J, Hu J, Yang J, Shen H-B, Tang J, Yang J-Y (2013) Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans Comput Biol Bioinf 10:994–1008

    Article  Google Scholar 

  28. Feng P-M, Chen W, Lin H, Chou K-C (2013) iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 442:118–125

    Article  Google Scholar 

  29. Manavalan B, Shin TH, Lee G (2018) PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine. Front Microbiol 9:476

    Article  Google Scholar 

  30. Jia C, Yang Q, Zou Q (2018) NucPosPred: predicting species-specific genomic nucleosome positionin g via four different modes of general PseKNC. J Theor Biol 450:15–21

    Article  Google Scholar 

  31. Hong X, Chen S, Harris CJ (2007) A kernel-based two-class classifier for imbalanced data sets. IEEE Trans Neural Netw 18:28–41

    Article  Google Scholar 

  32. Tahir M, Hayat M, Khan SA (2017) A Two-Layer Computational Model for Discrimination of Enhancer and Their Types Using Hybrid Features Pace of Pseudo K-Tuple Nucleotide Composition. Arab J Sci Eng 43:6719–6727

    Article  Google Scholar 

  33. Specht DF (1990) Probabilistic neural networks. Neural Netw 3:109–118

    Article  Google Scholar 

  34. Kozma L(2008) k Nearest Neighbors algorithm (kNN), Helsinki University of Technology

  35. Khan ZU, Hayat M, Khan MA (2015) Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J Theor Biol 365:197–203

    Article  MathSciNet  Google Scholar 

  36. Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst, Man, Cybern SMC-15:580–585

    Article  Google Scholar 

  37. Hayat M, Khan A (2012) Discriminating outer membrane proteins with fuzzy K-nearest neighbor algorithms based on the general form of Chou’s PseAAC. Protein Pept Lett 19:411–421

    Article  Google Scholar 

  38. Maillo J, Luengo J, García S, Herrera F, Triguero I (2017) Exact fuzzy k-nearest neighbor classification for big datasets, Fuzzy Systems (FUZZ-IEEE), 2017 IEEE international conference on, IEEE, pp 1–6

  39. Feng P, Yang H, Ding H, Lin H, Chen W, Chou K-C (2018) iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 111(1):96–102

    Article  Google Scholar 

  40. Manavalan B, Lee J (2017) SVMQA: support–vector-machine-based protein single-model quality assessment. Bioinformatics 33:2496–2503

    Article  Google Scholar 

  41. Chen W, Feng P, Yang H, Ding H, Lin H, Chou K-C (2017) iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 8:4208

    Article  Google Scholar 

Download references

Acknowledgements

The study is supported by the Taif University Researchers Supporting Project number (TURSP-2020/126), Taif University, Taif, Saudi Arabia.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Fazlullah Khan or Maqsood Hayat.

Ethics declarations

Conflict of interest

Authors have no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tahir, M., Khan, F., Hayat, M. et al. An effective machine learning-based model for the prediction of protein–protein interaction sites in health systems. Neural Comput & Applic 36, 65–75 (2024). https://doi.org/10.1007/s00521-022-07024-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07024-8

Keywords

Navigation