Loading [a11y]/accessibility-menu.js
Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm | IEEE Journals & Magazine | IEEE Xplore

Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm


Abstract:

The computational methods of protein-protein interaction sites prediction can effectively avoid the shortcomings of high cost and time in traditional experimental approac...Show More

Abstract:

The computational methods of protein-protein interaction sites prediction can effectively avoid the shortcomings of high cost and time in traditional experimental approaches. However, the serious class imbalance between interface and non-interface residues on the protein sequences limits the prediction performance of these methods. This work therefore proposed a new strategy, NearMiss-based under-sampling for unbalancing datasets and Random Forest classification (NM-RF), to predict protein interaction sites. Herein, the residues on protein sequences were represented by the PSSM-derived features, hydropathy index (HI) and relative solvent accessibility (RSA). In order to resolve the class imbalance problem, an under-sampling method based on NearMiss algorithm is adopted to remove some non-interface residues, and then the random forest algorithm is used to perform binary classification on the balanced feature datasets. Experiments show that the accuracy of NM-RF model reaches 87.6% and 84.3% on Dtestset72 and PDBtestset164 respectively, which demonstrate the effectiveness of the proposed NM-RF method in differentiating the interface or non-interface residues.
Published in: IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Volume: 19, Issue: 6, 01 Nov.-Dec. 2022)
Page(s): 3646 - 3654
Date of Publication: 27 October 2021

ISSN Information:

PubMed ID: 34705656

Funding Agency:


References

References is not available for this document.