Abstract
Identifying interacting sites of proteins is a relevant aspect for drug and vaccine design, and it provides clues for understanding the protein function. Although such a prediction is a problem extensively addressed in the literature, just a few approaches consider the protein sequence only. The use of the protein sequences is an important issue because the three-dimensional structure of proteins could be unknown. Moreover, such a structural determination experimentally is expensive and time-consuming, and it may contain errors due to experimentation. On the other hand, sequence based method suffers when the knowledge of sequence is incomplete.
In this work, we present ProSPs, a method for predicting the protein residues considering protein sequence fragments, which are obtained using sliding windows and become the samples for an unbalance binary classification problem. We use the Random Forest classifier for data training. Each amino acid is enriched using a selected subset of physicochemical and biochemical amino acid characteristics from the AAIndex1 database. We test the framework on two classes of proteins, Antibody-Antigen and Antigen-Bound Antibody, extracted from the Protein-Protein Docking Benchmark 5.0. The obtained results evaluated in terms of the area under the ROC curve (AU-ROC) on these classes outperform the sequence-based algorithms in the literature and are comparable with the ones based on three-dimensional structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmad, S., Mizuguchi, K.: Partner-aware prediction of interacting residues in protein-protein complexes from sequence data. PLoS ONE 6(12), e29104 (2011)
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9(7), 1545–1588 (1997)
Berggård, T., Linse, S., James, P.: Methods for the detection and analysis of protein-protein interactions. Proteomics 7(16), 2833–2842 (2007)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Daberdaku, S.: Structure-based antibody paratope prediction with 3D zernike descriptors and SVM. In: Raposo, M., Ribeiro, P., Sério, S., Staiano, A., Ciaramella, A. (eds.) CIBB 2018. LNCS, vol. 11925, pp. 27–49. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-34585-3_4
Daberdaku, S., Ferrari, C.: Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction. BMC Bioinform. 19(1), 35 (2018)
Daberdaku, S., Ferrari, C.: Antibody interface prediction with 3D Zernike descriptors and SVM. Bioinformatics 35(11), 1870–1876 (2019)
Fry, D.C.: Protein-protein interactions as targets for small molecule drug discovery. Peptide Sci. Original Res. Biomolecules 84(6), 535–552 (2006)
Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Jordan, R.A., Yasser, E.M., Dobbs, D., Honavar, V.: Predicting protein-protein interface residues using local surface structural similarity. BMC Bioinform. 13(1), 41 (2012)
Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: Aaindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(suppl\(_1\)), D202–D205 (2007)
Keskin, O., Tuncbag, N., Gursoy, A.: Predicting protein-protein interactions from the molecular to the proteome level. Chem. Rev. 116(8), 4884–4909 (2016)
Murakami, Y., Mizuguchi, K.: Applying the naïve bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. Bioinformatics 26(15), 1841–1848 (2010)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://jmlr.org/papers/v12/pedregosa11a.html
Porollo, A., Meller, J.: Prediction-based fingerprints of protein-protein interactions. Proteins: Struct. Funct. Bioinform. 66(3), 630–645 (2007)
Porollo, A., Meller, J., Cai, W., Hong, H.: Computational methods for prediction of protein-protein interaction sites. Protein-Protein Interact. Comput. Exp. Tools 472, 3–26 (2012)
Quadrini, M., Culmone, R., Merelli, E.: Topological classification of RNA structures via intersection graph. In: Martín-Vide, C., Neruda, R., Vega-Rodríguez, M.A. (eds.) TPNC 2017. LNCS, vol. 10687, pp. 203–215. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71069-3_16
Quadrini, M., Daberdaku, S., Ferrari, C.: Hierarchical representation and graph convolutional networks for the prediction of protein–protein interaction sites. In: Nicosia, G., Ojha, V., La Malfa, E., Jansen, G., Sciacca, V., Pardalos, P., Giuffrida, G., Umeton, R. (eds.) LOD 2020. LNCS, vol. 12566, pp. 409–420. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64580-9_34
Quadrini., M., Merelli., E., Piergallini., R.: Loop grammars to identify RNA structural patterns. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, pp. 302–309. SciTePress (2019)
Quadrini, M., Tesei, L., Merelli, E.: ASPRAlign: a tool for the alignment of RNA secondary structures with arbitrary pseudoknots. Bioinformatics 36(11), 3578–3579 (2020)
Saha, I., Maulik, U., Bandyopadhyay, S., Plewczynski, D.: Fuzzy clustering of physicochemical and biochemical properties of amino acids. Amino Acids 43(2), 583–594 (2012)
Šikić, M., Tomić, S., Vlahoviček, K.: Prediction of protein-protein interaction sites in sequences and 3d structures by random forests. PLoS Comput. Biol. 5(1), e1000278 (2009)
Sriwastava, B.K., Basu, S., Maulik, U.: Predicting protein-protein interaction sites with a novel membership based fuzzy SVM classifier. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(6), 1394–1404 (2015)
Vreven, T., et al.: Updates to the integrated protein-protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol. 427(19), 3031–3041 (2015)
Xue, L.C., Dobbs, D., Honavar, V.: Homppi: a class of sequence homology based protein-protein interface prediction methods. BMC Bioinform. 12(1), 244 (2011)
Yin, S., Proctor, E.A., Lugovskoy, A.A., Dokholyan, N.V.: Fast screening of protein surfaces using geometric invariant fingerprints. Proc. Natl. Acad. Sci. 106(39), 16622–16626 (2009)
Zhang, B., Li, J., Quan, L., Chen, Y., Lü, Q.: Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network. Neurocomputing 357, 86–100 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Quadrini, M., Cavallin, M., Daberdaku, S., Ferrari, C. (2022). ProSPs: Protein Sites Prediction Based on Sequence Fragments. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13163. Springer, Cham. https://doi.org/10.1007/978-3-030-95467-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-95467-3_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95466-6
Online ISBN: 978-3-030-95467-3
eBook Packages: Computer ScienceComputer Science (R0)