Skip to main content

Predicting Protein-Protein Interactions with Weighted PSSM Histogram and Random Forests

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques (IScIDE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9243))

Abstract

The prediction of protein-protein interactions is one of the most important and challenging problems in computational biology. Because position specific scoring matrix (PSSM) encodes the evolutionary conservation information of a protein, the PSSM-derived features have been widely used to predict protein-protein interaction residues in previous studies. In this paper, we developed a novel method to extract feature, called weighted PSSM histogram, from the PSSM of a protein by introducing the concept of histogram in digital image processing field. Based on the extracted weighted PSSM histogram and several traditional features, we trained a random forests prediction model. Experiment results on benchmark datasets demonstrated the efficacy of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Porollo, A., Meller, J.: Prediction-based fingerprints of protein–protein interactions. Proteins Struct. Funct. Bioinf. 66, 630–645 (2007)

    Article  Google Scholar 

  2. Murakami, Y., Mizuguchi, K.: Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics 26, 1841–1848 (2010)

    Article  Google Scholar 

  3. Dhole, K., Singh, G., Pai, P.P., Mondal, S.: Sequence-based prediction of protein–protein interaction sites with L1-logreg classifier. J. Theor. Biol. 348, 47–54 (2014)

    Article  Google Scholar 

  4. Singh, G., Dhole, K., Pai, P.P., Mondal, S.: SPRINGS: prediction of protein-protein interaction sites using artificial neural networks. PeerJ PrePrints 1, 7 (2014)

    Google Scholar 

  5. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)

    Article  Google Scholar 

  6. Hwang, H., Pierce, B., Mintseris, J., Janin, J., Weng, Z.: Protein–protein docking benchmark version 3.0. Proteins Struct. Funct. Bioinf. 73, 705–709 (2008)

    Article  Google Scholar 

  7. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  8. Mihel, J., Šikić, M., Tomić, S., Jeren, B., Vlahoviček, K.: PSAIA–protein structure and interaction analyzer. BMC Struct. Biol. 8, 21 (2008)

    Article  Google Scholar 

  9. Li, B.-Q., Feng, K.-Y., Chen, L., Huang, T., Cai, Y.-D.: Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS. PLoS ONE 7, e43927 (2012)

    Article  Google Scholar 

  10. Yu, D., Hu, J., Yang, J., Shen, H., Tang, J.: Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinf. 10, 15 (2013)

    Google Scholar 

  11. Yu, D.J., Hu, J., Huang, Y., Shen, H.B., Qi, Y., Tang, Z.M., Yang, J.Y.: TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J. Comput. Chem. 34, 974–985 (2013)

    Article  Google Scholar 

  12. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L.: BLAST+: architecture and applications. BMC Bioinf. 10, 421 (2009)

    Article  Google Scholar 

  13. Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982)

    Article  Google Scholar 

  14. Lee, B., Richards, F.M.: The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–IN4 (1971)

    Article  Google Scholar 

  15. Joo, K., Lee, S.J., Lee, J.: Sann: solvent accessibility prediction of proteins by nearest neighbor method. Proteins Struct. Funct. Bioinf. 80, 1791–1797 (2012)

    Google Scholar 

  16. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  17. Gallet, X., Charloteaux, B., Thomas, A., Brasseur, R.: A fast method to predict protein interaction sites from sequences. J. Mol. Biol. 302, 917–926 (2000)

    Article  Google Scholar 

  18. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)

    Article  Google Scholar 

  19. Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 405, 442–451 (1975)

    Article  Google Scholar 

  20. Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inf. Assoc. 12, 296–298 (2005)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for suggestions and comments which helped improve the quality of this paper. This work was supported by the National Natural Science Foundation of China (No. 61373062 and 61233011), the Natural Science Foundation of Jiangsu (No. BK20141403), the Jiangsu Postdoctoral Science Foundation (No. 1201027C), and the China Postdoctoral Science Foundation (No. 2013M530260, 2014T70526).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong-Jun Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wei, ZS., Yang, JY., Yu, DJ. (2015). Predicting Protein-Protein Interactions with Weighted PSSM Histogram and Random Forests. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23862-3_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23861-6

  • Online ISBN: 978-3-319-23862-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics