Abstract
Protein is a vital biomolecule that accomplishes distinct biological activities by interacting with other proteins in complex biological systems. The protein–protein interaction (PPI) sites hot spot characterization holds preliminary importance in drug discovery as well as in the comprehension of the cellular signaling phenomenon. Looking at the significance of PPIs, an intelligent prediction system based on the notion of fuzzy logic “PPIs-FuzzyKNN” is developed for PPI sites identification. Here, protein sequences are transformed into an equal length of numerical descriptors by using physicochemical properties of amino acids and a position-specific scoring matrix. Here, we have utilized conventional machine learning algorithms as well as fuzzy k-nearest neighbors. The results of the model are assessed via a tenfold cross-validation test. The proposed model PPIs-FuzzyKNN obtained 91.20, 92.65, and 93.50% of accuracy on the three different datasets, namely Dtestset72, PDBtestset164, and Dset186, respectively. The results exhibited that the outcomes of the proposed model are outstanding and persistent in all datasets, so far, compared to the literature. Consequently, it will not only play a leading role in the accurate identification of PPI sites but also becomes a rudimentary tool for the research community.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Huart A-S, MacLaine NJ, Narayan V, Hupp TR (2012) Exploiting the MDM2-CK1α Protein-Protein Interface to Develop Novel Biologics That Induce UBL-Kinase-Modification and Inhibit Cell Growth. PloS one 7:e43391
Wei L, Liao M, Gao X, Zou Q (2015) An improved protein structural classes prediction method by incorporating both sequence and structure information. IEEE Trans Nanobiosci 14:339–349
Hwang H, Pierce B, Mintseris J, Janin J, Weng Z (2008) Protein–protein docking benchmark version 3.0, Proteins: structure. Funct Bioinf 73:705–709
Sharma A, Lyons J, Dehzangi A, Paliwal KK (2013) A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J Theor Biol 320:41–46
Ghoorah AW, Devignes M-D, Smaïl-Tabbone M, Ritchie DW (2011) Spatial clustering of protein binding sites for template based protein docking. Bioinformatics 27:2820–2827
Mignani S, El Kazzouli S, Bousmina MM, Majoral J-P (2014) Dendrimer space exploration: an assessment of dendrimers/dendritic scaffolding as inhibitors of protein–protein interactions, a potential new area of pharmaceutical development. Chem Rev 114:1327–1342
Mørk S, Pletscher-Frankild S, Caro AP, Gorodkin J, Jensen LJ (2013) Protein-driven inference of miRNA–disease associations. Bioinformatics 30:392–397
Rao VS, Srinivas K, Sujini G, Kumar G (2014) Protein-protein interaction detection: methods and analysis. Int J Proteom. https://doi.org/10.1155/2014/147648
Jones S, Thornton JM (1997) Analysis of protein-protein interaction sites using surface patches. J Mol Biol 272:121–132
Wei Z-S, Han K, Yang J-Y, Shen H-B, Yu D-J (2016) Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing 193:201–212
Ofran Y, Rost B (2007) ISIS: interaction sites identified from sequence. Bioinformatics 23:e13–e16
Porollo A, Meller J (2007) Prediction-based fingerprints of protein–protein interactions. Proteins: Struct, Function, Bioinf 66:630–645
Murakami Y, Mizuguchi K (2010) Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics 26:1841–1848
Singh G, Dhole K, Pai PP, Mondal S (2014) SPRINGS: prediction of protein-protein interaction sites using artificial neural networks, PeerJ PrePrints
Dhole K, Singh G, Pai PP, Mondal S (2014) Sequence-based prediction of protein–protein interaction sites with L1-logreg classifier. J Theor Biol 348:47–54
Chou K-C (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273:236–247
Liu G-H, Shen H-B, Yu D-J (2016) Prediction of protein-protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J Membr Biol 249:141–153
Jia J, Liu Z, Xiao X, Liu B, Chou K-C (2015) iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. J Theor Biol 377:47–56
Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct Function, Bioinf 43:246–255
Hayat M, Khan A (2012) MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. J Theor Biol 292:93–102
Hayat M, Khan A (2013) WRF-TMH: predicting transmembrane helix by fusing composition index and physicochemical properties of amino acids. Amino Acids 44:1317–1328
Chou K-C, Shen H-B (2007) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345
Hayat M, Tahir M (2015) PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine. Mol BioSyst 11:2255–2262
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Yu D, Wu X, Shen H, Yang J, Tang Z, Qi Y, Yang J (2012) Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features. IEEE Trans Nanobiosci 11:375–385
Yu D-J, Shen H-B, Yang J-Y (2012) SOMPNN: an efficient non-parametric model for predicting transmembrane helices. Amino Acids 42:2195–2205
Yu D-J, Hu J, Yang J, Shen H-B, Tang J, Yang J-Y (2013) Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans Comput Biol Bioinf 10:994–1008
Feng P-M, Chen W, Lin H, Chou K-C (2013) iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 442:118–125
Manavalan B, Shin TH, Lee G (2018) PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine. Front Microbiol 9:476
Jia C, Yang Q, Zou Q (2018) NucPosPred: predicting species-specific genomic nucleosome positionin g via four different modes of general PseKNC. J Theor Biol 450:15–21
Hong X, Chen S, Harris CJ (2007) A kernel-based two-class classifier for imbalanced data sets. IEEE Trans Neural Netw 18:28–41
Tahir M, Hayat M, Khan SA (2017) A Two-Layer Computational Model for Discrimination of Enhancer and Their Types Using Hybrid Features Pace of Pseudo K-Tuple Nucleotide Composition. Arab J Sci Eng 43:6719–6727
Specht DF (1990) Probabilistic neural networks. Neural Netw 3:109–118
Kozma L(2008) k Nearest Neighbors algorithm (kNN), Helsinki University of Technology
Khan ZU, Hayat M, Khan MA (2015) Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J Theor Biol 365:197–203
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst, Man, Cybern SMC-15:580–585
Hayat M, Khan A (2012) Discriminating outer membrane proteins with fuzzy K-nearest neighbor algorithms based on the general form of Chou’s PseAAC. Protein Pept Lett 19:411–421
Maillo J, Luengo J, García S, Herrera F, Triguero I (2017) Exact fuzzy k-nearest neighbor classification for big datasets, Fuzzy Systems (FUZZ-IEEE), 2017 IEEE international conference on, IEEE, pp 1–6
Feng P, Yang H, Ding H, Lin H, Chen W, Chou K-C (2018) iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 111(1):96–102
Manavalan B, Lee J (2017) SVMQA: support–vector-machine-based protein single-model quality assessment. Bioinformatics 33:2496–2503
Chen W, Feng P, Yang H, Ding H, Lin H, Chou K-C (2017) iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 8:4208
Acknowledgements
The study is supported by the Taif University Researchers Supporting Project number (TURSP-2020/126), Taif University, Taif, Saudi Arabia.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
Authors have no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tahir, M., Khan, F., Hayat, M. et al. An effective machine learning-based model for the prediction of protein–protein interaction sites in health systems. Neural Comput & Applic 36, 65–75 (2024). https://doi.org/10.1007/s00521-022-07024-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07024-8