ABSTRACT
Peptide-binding proteins are excessive in living cells and proteinpeptide interactions mediate a wide range of cellular functions. Prediction of protein-peptide binding residues has been vital and popular in the past decades and machine learning methods have gained more attention in recent years. However, the data imbalance problem has not been dealt with effectively. On this matter, we study the effects of sampling methods and degrees of imbalance on data classes on construction of prediction model. We first developed the NearMiss under-sampling method (NMUS) as a way to screen out a given number of quality data samples from majority class to balance the data sets. The remarkable sensitivity (SEN) with 0.818 shows the advantage of NMUS in handling class imbalance problem. This research carried on valuable analysis on data imbalance problem and achieved a better prediction of protein-peptide binding interaction.
- Curatolo, W., et al., Protein-lipid interactions: recombinants of the proteolipid apoprotein of myelin with dimyristoyllecithin. Biochemistry, 1977. 16(11): p. 2313--2319.Google Scholar
- Arquier, N., et al., Drosophila ALS Regulates Growth and Metabolism through Functional Interaction with Insulin-Like Peptides. Cell Metabolism, 2008. 7(4): p. 333--338.Google Scholar
- Orengo, C.A., et al., CATH -- a hierarchic classification of protein domain structures. Structure, 1997. 5(8): p. 1093--1109.Google Scholar
- Ponting, C.P., et al., SMART: identification and annotation of domains from signalling and extracellular protein sequences. Nucleic Acids Research, 1999. 27(1): p. 229--232.Google Scholar
- Chen, S., et al., Location of a folding protein and shape changes in GroEL--GroES complexes imaged by cryoelectron microscopy. Nature, 1994. 371(6494): p. 261--264.Google Scholar
- Li, B.-Q., et al., Prediction of Protein-Peptide Interactions with a Nearest Neighbor Algorithm. Current Bioinformatics, 2018. 13(1): p. 14--24.Google Scholar
- Taherzadeh, G., et al., Sequence-based prediction of protein--peptide binding sites using support vector machine. Journal of Computational Chemistry, 2016. 37(13): p. 1223--1229.Google Scholar
- Zhao, Z., Z. Peng, and J. Yang, Improving Sequence-Based Prediction of Protein--Peptide Binding Residues by Introducing Intrinsic Disorder and a Consensus Method. Journal of Chemical Information and Modeling, 2018. 58(7): p. 1459--1468.Google Scholar
- Zhang, J.P. and Mani, I. KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. Proceeding of International Conference on Machine Learning (ICML 2003), Workshop on Learning from Imbalanced Data Sets, Washington DC, 21 August 2003.Google Scholar
- Anowar, F., S. Sadaoui, and M. Mouhoub. Auction Fraud Classification Based on Clustering and Sampling Techniques. in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). 2018.Google Scholar
- Bao, L., et al., Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets. Neurocomputing, 2016. 172: p. 198--206.Google Scholar
- Dubchak, I., et al., Prediction of protein folding class using global description of amino acid sequence. Proceedings of the National Academy of Sciences, 1995. 92(19): p. 8700.Google ScholarCross Ref
- Govindan, G. and A.S. Nair. Composition, Transition and Distribution (CTD) --- A dynamic feature for predictions based on hierarchical structure of cellular sorting. in 2011 Annual IEEE India Conference. 2011.Google ScholarCross Ref
- Cai, C.Z., et al., SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Research, 2003. 31(13): p. 3692--3697.Google Scholar
- Tomii, K. and M. Kanehisa, Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Engineering, Design and Selection, 1996. 9(1): p. 27--36.Google Scholar
- Breiman, L., Random Forests. Machine Learning, 2001. 45(1): p. 5--32.Google Scholar
- Li, B., et al., Characterization of local geometry of protein surfaces with the visibility criterion. Proteins: Structure, Function, and Bioinformatics, 2008. 71(2): p. 670--683.Google Scholar
- Liang, S., et al., Protein binding site prediction using an empirical scoring function. Nucleic Acids Research, 2006. 34(13): p. 3698--3707.Google Scholar
- Petsalaki, E., et al., Accurate Prediction of Peptide Binding Sites on Protein Surfaces. PLOS Computational Biology, 2009. 5(3): p. e1000335.Google Scholar
- Lavi, A., et al., Detection of peptide-binding sites on protein surfaces: The first step toward the modeling and targeting of peptide-mediated interactions. Proteins: Structure, Function, and Bioinformatics, 2013. 81(12): p. 2096--2105.Google Scholar
Index Terms
- Study of Data Imbalanced Problem in Protein-peptide Binding Prediction
Recommendations
Quantitative Prediction of Peptide Binding to HLA-DP1 Protein
The exogenous proteins are processed by the host antigen-processing cells. Peptidic fragments of them are presented on the cell surface bound to the major hystocompatibility complex (MHC) molecules class II and recognized by the CD4+ T lymphocytes. The ...
Improving MHC binding peptide prediction by incorporating binding data of auxiliary MHC molecules
Motivation: Various computational methods have been proposed to tackle the problem of predicting the peptide binding ability for a specific MHC molecule. These methods are based on known binding peptide sequences. However, current available peptide ...
Comments