Skip to main content

Sequence-Based Random Projection Ensemble Approach to Identify Hotspot Residues from Whole Protein Sequence

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9226))

Abstract

Hot spot residues of proteins are key to performing specific functions in many biological processes. However the identification of hot spots by experimental methods is costly and time-consuming. Computational method is an alternative to identify hot spots by using sequential and structural information. However, structural information of protein is not always available. In this paper, the issue of identifying hot spots is addressed by using statistically physicochemical properties of amino acids only. Firstly, 34 relatively independent physicochemical properties are extracted from the 544 properties in AAindex1. Since the hot spots data set is extremely imbalanced, the ratio of the number of hot spots to that of non-hot spots is about 1.4 %, the hot spot set and a set of non-hot spot subset with roughly the number of that hot spots forms an initial input matrix. Random projection on the matrix achieves an input to a REPTree classifier. Several random projections and different sets of non-hot spots build an ensemble REPTree system. Experimental results showed that although our method performed worse it is a complement to the experiments on hot spot determination, on the commonly used hot spot benchmark sets.

This work was supported by the National Natural Science Foundation of China (Nos. 61300058, 61271098 and 61472282).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280((1), 1–9 (1998)

    Article  Google Scholar 

  2. Clackson, T., Wells, J.A.: A hot spot of binding energy in a hormone-receptor interface. Science 267(5196)), 383–386 (1995)

    Article  Google Scholar 

  3. Kortemme, T., Baker, D.: A simple physical model for binding energy hot spot in protein-protein complex. Proc. Natl. Acad. Sci. USA 99(22), 14116–141121 (2002)

    Article  Google Scholar 

  4. Keskin, O., Ma, B., Nussinov, R.: Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol. 345(5), 1281–1294 (2005)

    Article  Google Scholar 

  5. Thorn, K.S., Bogan, A.A.: Asedb: a database of alanine mutations and their Effects on the free energy of binding in protein interactions. Bioinformatics 17(3), 284–285 (2001)

    Article  Google Scholar 

  6. Fischer, T.B., Arunachalam, K.V., Bailey, D., Mangual, V., Bakhru, S., Russo, R., Huang, D., Paczkowski, M., Lalchandani, V., Ramachandra, C., Ellison, B., Galer, S., Shapley, J., Fuentes, E., Tsai, J.: The binding interface database (bid): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)

    Article  Google Scholar 

  7. Kumar, M.D.S., Gromiha, M.M.: Pint: protein-protein interactions thermodynam-Ic database. Nucleic Acids Res. 34, D195–D198 (2006)

    Article  Google Scholar 

  8. Moal, I.H., Fernández-Recio, J.: Skempi: A structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics 28(20), 2600–2607 (2012)

    Article  Google Scholar 

  9. DeLano, W.L.: unraveling hot spots in binding interfaces: progress and challenges. Curr. Opin. Struct. Biol. 12(1), 14–20 (2002)

    Article  Google Scholar 

  10. Kortemme, T., Baker, D.: A simple physical model for binding energy hot spots in protein–protein complexes. Proc. Natl. Acad. Sci. 99(22), 14116–14121 (2002)

    Article  Google Scholar 

  11. Guerois, R., Nielsen, J.E., Serrano, L.: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320(2), 369–387 (2002)

    Article  Google Scholar 

  12. Gao, Y., Wang, R., Lai, L.: Structure-based method for analyzing protein-protein interfaces. J. Mol. Model. 10(1), 44–54 (2004)

    Article  Google Scholar 

  13. Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F., Serrano, L.: The foldx web server: an online Force field. Nucleic Acids Res. 33(Web Server issue), W382–W388 (2005)

    Article  Google Scholar 

  14. Huo, S., Massova, I., Kollman, P.A.: Computational alanine scanning of the 1:1 human growth hormone-receptor complex. J. Comput. Chem. 23(1), 15–27 (2002)

    Article  Google Scholar 

  15. Rajamani, D., Thiel, S., Vajda, S., Camacho, C.J.: Anchor residues in protein-Protein interactions. Proc. Natl. Acad. Sci. USA 101(31), 11287–11292 (2004)

    Article  Google Scholar 

  16. Gonzlez-Ruiz, D., Gohlke, H.: Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding. Curr. Med. Chem. 13(22), 2607–2625 (2006)

    Article  Google Scholar 

  17. Ma, B., Elkayam, T., Wolfson, H., Nussinov, R.: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl. Acad. Sci. USA 100(10), 5772–5777 (2003)

    Article  Google Scholar 

  18. del Sol, A., O’Meara, P.: Small-world network approach to identify key residues in protein-protein interaction. Proteins 58(3), 672–682 (2005)

    Article  Google Scholar 

  19. Brinda, K.V., Kannan, N., Vishveshwara, S.: Analysis of homodimeric protein interfaces by graph-spectral methods. Protein Eng. 15(4), 265–277 (2002)

    Article  Google Scholar 

  20. Guharoy, M., Chakrabarti, P.: Conservation and relative importance of residues across protein-protein interfaces. Proc. Natl. Acad. Sci. USA 102(43), 15447–15452 (2005)

    Article  Google Scholar 

  21. Grosdidier, S., Fernndez-Recio, J.: identification of hot-spot residues in protein-protein interactions by computational docking. BMC Bioinform. 9, 447 (2008)

    Article  Google Scholar 

  22. Ofran, Y., Rost, B.: Protein-protein interaction hotspots carved into sequences. PLoS Comput. Biol. 3(7), e119 (2007)

    Article  Google Scholar 

  23. Darnell, S.J., Page, D., Mitchell, J.C.: An automated decision-tree approach to predicting protein interaction hot spots. Proteins 68(4), 813–823 (2007)

    Article  Google Scholar 

  24. Guney, E., Tuncbag, N., Keskin, O., Gursoy, A.: Hotsprint: database of computational hot spots in protein interfaces. Nucleic Acids Res. 36(Database issue), D662–D666 (2008)

    Google Scholar 

  25. Tuncbag, N., Gursoy, A., Keskin, O.: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)

    Article  Google Scholar 

  26. Cho, K.I., Kim, D., Lee, D.: A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res. 37(8), 2672–2687 (2009)

    Article  Google Scholar 

  27. Lise, S., Archambeau, C., Pontil, M., Jones, D.T.: Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinform. 10, 365 (2009)

    Article  Google Scholar 

  28. Xia, J.F., Zhao, X.M., Song, J., Huang, D.S.: Apis: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinform. 11, 174 (2010)

    Article  Google Scholar 

  29. Tuncbag, N., Keskin, O., Gursoy, A.: Hotpoint: hot spot prediction server for protein interfaces. Nucleic Acids Res. 38(Web Server issue), W402–W406 (2010)

    Article  Google Scholar 

  30. Lise, S., Buchan, D., Pontil, M., Jones, D.T.: Predictions of hot spot residues at protein-protein interfaces using support vector machines. PLoS ONE 6(2), e16774 (2011)

    Article  Google Scholar 

  31. Wang, L., Liu, Z.P., Zhang, X.S., Chen, L.: Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Eng. Des. Sel. 25(3), 119–126 (2012)

    Article  Google Scholar 

  32. Chen, P., Li, J., Wong, L., Kuwahara, H., Huang, J.Z., Gao, X.: Accurate prediction of hot Spot residues through physicochemical characteristics of amino acid sequences. Proteins 81(8), 1351–1362 (2013)

    Article  Google Scholar 

  33. Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: Aaindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(Database issue), D202–D205 (2008)

    Google Scholar 

  34. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Miller, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  35. Chen, P., Li, J.: Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinform. 11, 402 (2010)

    Article  Google Scholar 

  36. Chen, P., Wong, L., Li, J.: Detection of outlier residues for improving interface prediction in protein heterocomplexes. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1155–1165 (2012)

    Article  MathSciNet  Google Scholar 

  37. Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998)

    Google Scholar 

  38. Kaski, S.: dimensionality reduction by random mapping: fast similarity computation for clustering. In: Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference, vol. 1, pp. 413–418 (1998)

    Google Scholar 

  39. Esposito, F., Malerba, D., Semeraro, G., Tamma, V.: The Effects of pruning methods on the predictive accuracy of induced decision trees (1999)

    Google Scholar 

  40. Chen, P., Huang, J.Z., Gao, X.: Ligandrfs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinform. 15(Suppl 15), S4 (2014)

    Article  Google Scholar 

  41. Kuncheva, L.I., Whitaker, C.J., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  42. Wang, B., Chen, P., Huang, D.S., Li, J.J., Lok, T.M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580(2), 380–384 (2006)

    Article  Google Scholar 

  43. Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Bio. 157(1), 105–132 (1982)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, P., Hu, S., Wang, B., Zhang, J. (2015). Sequence-Based Random Projection Ensemble Approach to Identify Hotspot Residues from Whole Protein Sequence. In: Huang, DS., Jo, KH., Hussain, A. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9226. Springer, Cham. https://doi.org/10.1007/978-3-319-22186-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22186-1_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22185-4

  • Online ISBN: 978-3-319-22186-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics