Abstract
We present the results of the application of some machine learning algorithms to predict the hot spots & hot regions residues in protein complexes at the protein-protein interface between their polypeptide chains. The dataset consisted of twenty-nine bone morphogenetic proteins (BMPs) obtained from the Protein Data Bank (PDB). The training features were selected from biochemical and biophysical properties such as B-factor, hydrophobicity index, prevalence score, accessible surface area (ASA), conservation score, and the ground-state energy (using Density Functional Theory (DFT)) of each amino acid of these interfaces. Also, we implemented parallel CPU/GPU hardware acceleration techniques during the preprocessing in order to speed up the ASA and DFT calculations with more efficient execution times. We evaluated the performance of the classifiers with several metrics. The random forest classifier obtained the best performance, achieving an average of \(90\%\) of well-classified residues in both the true negative and true positive rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280(mb981843), 1–9 (1998)
Ashkenazy, H., et al.: ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 44, 344–350 (2016). https://doi.org/10.1093/nar/gkw408
Berman, H., Henrick, K., Nakamura, H., Markley, J.: The worldwide protein data bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 35, D301–D303 (2007)
Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS ONE 12(6), 1–17 (2017). https://doi.org/10.1371/journal.pone.0177678
Carugo, O.: How large b-factors can be in protein crystal structures. BMC Bioinf. 19(61), 1–9 (2018). https://doi.org/10.1186/s12859-018-2083-8
Chen, D., Zhao, M., Mundy, G.R.: Bone morphogenetic proteins. Growth Factors 22(4), 233–241 (2004)
Cukuroglu, E., Gursoy, A., Keskin, O.: HotRegion: a database of predicted hot spot clusters. Nucleic Acids Res. 40(22080558), 829–833 (2011)
Haykin, S., Haykin, S.: Neural Networks and Learning Machines, vol. 10. Prentice Hall, New York (2009)
Hintze, B.J., et al.: MolProbity ultimate rotamer-library distributions for model validation. Proteins Struct. Funct. Bioinf. 84, 1177–1189 (2016)
Kortemme, T., Baker, D.: A simple physical model for binding energy hot spots in protein-protein complexes. PNAS 99(22), 14116–14121 (2002). https://doi.org/10.1073/pnas.202485799
Kortemme, T., Kim, D.E., Baker, D.: Computational alanine scanning of protein-protein interfaces. Sci. STKE Protoc. 1–8 (2004). https://doi.org/10.1126/stke.2192004pl2
Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 5(157), 105–132 (1982). https://doi.org/10.1016/0022-2836(82)90515-0
Lise, S., et al.: Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinf. 10(365), 1–17 (2009). https://doi.org/10.1186/1471-2105-10-365
Liu, S., Liu, C., Deng, L.: Machine learning approaches for protein-protein interaction hot spot prediction: progress and comparative assessment. MDPI Mol. 23(10), 2535 (2018). https://doi.org/10.3390/molecules23102535
McKerns, M.M., et al.: Building a framework for predictive science. In: Proceedings of the 10th Python in Science Conference, vol. 1, pp. 1–11 (2011). https://doi.org/10.48550/arXiv.1202.1056
Mitternacht, S.: FreeSASA: an open source C library for solvent accessible surface area calculations. F1000 Res. 5(189), 1–10 (2016). https://doi.org/10.12688/f1000research.7931.1
Morrow, J.K., Zhang, S.: Computational prediction of hot spot residues. Curr. Pharm. Des. 18, 1255–1265 (2012). https://doi.org/10.2174/138161212799436412
Muller, R.: PyQuante2. PyQuante Sourceforge Project Page (2013). https://github.com/rpmuller/pyquante2
Tuncbag, N., Keskin, O., Gursoy, A.: Hotpoint: hot spot prediction server for protein interfaces. Nucleic Acids Res. 38(20444871), 402–406 (2010). https://doi.org/10.1093/nar/gkq323
Nguyen, Q.T., Fablet, R., Pastor, D.: Protein interaction hotspot identification using sequence-based frequency-derived features. IEEE Trans. Biomed. Eng. 60(11), 2993–3002 (2013). https://doi.org/10.1109/TBME.2011.2161306
Nussinov, R., Schreiber, G.: Computational Protein-Protein Interactions. CRC Press, Boca Raton (2009). https://doi.org/10.1201/9781420070071
NVIDIA, Vingelmann, P., Fitzek, F.H.: CUDA, release. Accessed 10 Feb 1989 (2020). https://developer.nvidia.com/cuda-toolkit
PDBremix: Calculating the solvent accessible surface area (2014)
Qiao, Y., et al.: Protein-protein interface hot spots prediction based on a hybrid feature selection strategy. BMC Bioinf. 14(19), 1–16 (2018). https://doi.org/10.1186/s12859-018-2009-5
Shrake, A., Rupley, J.A.: Environment and exposure to solvent of protein atoms. lysozyme and insulin. J. Mol. Biol. 2(79), 351–371 (1973). https://doi.org/10.1016/0022-2836(73)90011-9
Stephen, F., et al.: Density functional theory calculations on entire proteins for free energies of binding: application to a model polar binding site. Proteins Struct. Funct. Bioinf. 82(12), 3335–3346 (2014). https://doi.org/10.1002/prot.24686
Tuncbag, N., Gursoy, A., Keskin, O.: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. J. Bioinf. 25(12), 1513–1520 (2009). https://doi.org/10.1093/bioinformatics/btp240
Cavalcante, J.P.U., Gonçalves, A.C., Bonidia, R.P., Sanches, D.S., de Carvalho, A.C.P.L.F.: MathPIP: classification of proinflammatory peptides using mathematical descriptors. In: Stadler, P.F., Walter, M.E.M.T., Hernandez-Rosales, M., Brigido, M.M. (eds.) BSB 2021. LNCS, vol. 13063, pp. 131–136. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91814-9_13
Wang, L., et al.: Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Eng. Des. Sel. 25(3), 119–126 (2012). https://doi.org/10.1093/protein/gzr066
Xia, J.F., et al.: APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinf. 174(11), 1–14 (2010). https://doi.org/10.1186/1471-2105-11-174
Yan, C., et al.: Characterization of protein-protein interfaces. Protein J. 27(1), 59–70 (2008). https://doi.org/10.1007/s10930-007-9108-x
Acknowledgments
This study was supported by: “Programa de desarrollo tecnológico e innovación para alumnos del IPN. México 2021” and by CONACYT (Consejo Nacional de Ciencia y Tecnología).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chaparro-Amaro, O., Martínez-Felipe, M., Martínez-Castro, J. (2022). Hot Spots & Hot Regions Detection Using Classification Algorithms in BMPs Complexes at the Protein-Protein Interface with the Ground-State Energy Feature. In: Vergara-Villegas, O.O., Cruz-Sánchez, V.G., Sossa-Azuela, J.H., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds) Pattern Recognition. MCPR 2022. Lecture Notes in Computer Science, vol 13264. Springer, Cham. https://doi.org/10.1007/978-3-031-07750-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-07750-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07749-4
Online ISBN: 978-3-031-07750-0
eBook Packages: Computer ScienceComputer Science (R0)