Skip to main content

ProSPs: Protein Sites Prediction Based on Sequence Fragments

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2021)

Abstract

Identifying interacting sites of proteins is a relevant aspect for drug and vaccine design, and it provides clues for understanding the protein function. Although such a prediction is a problem extensively addressed in the literature, just a few approaches consider the protein sequence only. The use of the protein sequences is an important issue because the three-dimensional structure of proteins could be unknown. Moreover, such a structural determination experimentally is expensive and time-consuming, and it may contain errors due to experimentation. On the other hand, sequence based method suffers when the knowledge of sequence is incomplete.

In this work, we present ProSPs, a method for predicting the protein residues considering protein sequence fragments, which are obtained using sliding windows and become the samples for an unbalance binary classification problem. We use the Random Forest classifier for data training. Each amino acid is enriched using a selected subset of physicochemical and biochemical amino acid characteristics from the AAIndex1 database. We test the framework on two classes of proteins, Antibody-Antigen and Antigen-Bound Antibody, extracted from the Protein-Protein Docking Benchmark 5.0. The obtained results evaluated in terms of the area under the ROC curve (AU-ROC) on these classes outperform the sequence-based algorithms in the literature and are comparable with the ones based on three-dimensional structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmad, S., Mizuguchi, K.: Partner-aware prediction of interacting residues in protein-protein complexes from sequence data. PLoS ONE 6(12), e29104 (2011)

    Article  Google Scholar 

  2. Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9(7), 1545–1588 (1997)

    Article  Google Scholar 

  3. Berggård, T., Linse, S., James, P.: Methods for the detection and analysis of protein-protein interactions. Proteomics 7(16), 2833–2842 (2007)

    Article  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Daberdaku, S.: Structure-based antibody paratope prediction with 3D zernike descriptors and SVM. In: Raposo, M., Ribeiro, P., Sério, S., Staiano, A., Ciaramella, A. (eds.) CIBB 2018. LNCS, vol. 11925, pp. 27–49. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-34585-3_4

    Chapter  Google Scholar 

  6. Daberdaku, S., Ferrari, C.: Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction. BMC Bioinform. 19(1), 35 (2018)

    Article  Google Scholar 

  7. Daberdaku, S., Ferrari, C.: Antibody interface prediction with 3D Zernike descriptors and SVM. Bioinformatics 35(11), 1870–1876 (2019)

    Article  Google Scholar 

  8. Fry, D.C.: Protein-protein interactions as targets for small molecule drug discovery. Peptide Sci. Original Res. Biomolecules 84(6), 535–552 (2006)

    Google Scholar 

  9. Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)

    Google Scholar 

  10. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  11. Jordan, R.A., Yasser, E.M., Dobbs, D., Honavar, V.: Predicting protein-protein interface residues using local surface structural similarity. BMC Bioinform. 13(1), 41 (2012)

    Article  Google Scholar 

  12. Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: Aaindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(suppl\(_1\)), D202–D205 (2007)

    Google Scholar 

  13. Keskin, O., Tuncbag, N., Gursoy, A.: Predicting protein-protein interactions from the molecular to the proteome level. Chem. Rev. 116(8), 4884–4909 (2016)

    Article  Google Scholar 

  14. Murakami, Y., Mizuguchi, K.: Applying the naïve bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. Bioinformatics 26(15), 1841–1848 (2010)

    Article  Google Scholar 

  15. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://jmlr.org/papers/v12/pedregosa11a.html

  16. Porollo, A., Meller, J.: Prediction-based fingerprints of protein-protein interactions. Proteins: Struct. Funct. Bioinform. 66(3), 630–645 (2007)

    Google Scholar 

  17. Porollo, A., Meller, J., Cai, W., Hong, H.: Computational methods for prediction of protein-protein interaction sites. Protein-Protein Interact. Comput. Exp. Tools 472, 3–26 (2012)

    Google Scholar 

  18. Quadrini, M., Culmone, R., Merelli, E.: Topological classification of RNA structures via intersection graph. In: Martín-Vide, C., Neruda, R., Vega-Rodríguez, M.A. (eds.) TPNC 2017. LNCS, vol. 10687, pp. 203–215. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71069-3_16

    Chapter  MATH  Google Scholar 

  19. Quadrini, M., Daberdaku, S., Ferrari, C.: Hierarchical representation and graph convolutional networks for the prediction of protein–protein interaction sites. In: Nicosia, G., Ojha, V., La Malfa, E., Jansen, G., Sciacca, V., Pardalos, P., Giuffrida, G., Umeton, R. (eds.) LOD 2020. LNCS, vol. 12566, pp. 409–420. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64580-9_34

    Chapter  Google Scholar 

  20. Quadrini., M., Merelli., E., Piergallini., R.: Loop grammars to identify RNA structural patterns. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, pp. 302–309. SciTePress (2019)

    Google Scholar 

  21. Quadrini, M., Tesei, L., Merelli, E.: ASPRAlign: a tool for the alignment of RNA secondary structures with arbitrary pseudoknots. Bioinformatics 36(11), 3578–3579 (2020)

    Google Scholar 

  22. Saha, I., Maulik, U., Bandyopadhyay, S., Plewczynski, D.: Fuzzy clustering of physicochemical and biochemical properties of amino acids. Amino Acids 43(2), 583–594 (2012)

    Article  Google Scholar 

  23. Šikić, M., Tomić, S., Vlahoviček, K.: Prediction of protein-protein interaction sites in sequences and 3d structures by random forests. PLoS Comput. Biol. 5(1), e1000278 (2009)

    Article  Google Scholar 

  24. Sriwastava, B.K., Basu, S., Maulik, U.: Predicting protein-protein interaction sites with a novel membership based fuzzy SVM classifier. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(6), 1394–1404 (2015)

    Article  Google Scholar 

  25. Vreven, T., et al.: Updates to the integrated protein-protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol. 427(19), 3031–3041 (2015)

    Article  Google Scholar 

  26. Xue, L.C., Dobbs, D., Honavar, V.: Homppi: a class of sequence homology based protein-protein interface prediction methods. BMC Bioinform. 12(1), 244 (2011)

    Article  Google Scholar 

  27. Yin, S., Proctor, E.A., Lugovskoy, A.A., Dokholyan, N.V.: Fast screening of protein surfaces using geometric invariant fingerprints. Proc. Natl. Acad. Sci. 106(39), 16622–16626 (2009)

    Article  Google Scholar 

  28. Zhang, B., Li, J., Quan, L., Chen, Y., Lü, Q.: Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network. Neurocomputing 357, 86–100 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michela Quadrini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Quadrini, M., Cavallin, M., Daberdaku, S., Ferrari, C. (2022). ProSPs: Protein Sites Prediction Based on Sequence Fragments. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13163. Springer, Cham. https://doi.org/10.1007/978-3-030-95467-3_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95467-3_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95466-6

  • Online ISBN: 978-3-030-95467-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics