Abstract
The prediction of protein-protein interactions is one of the most important and challenging problems in computational biology. Because position specific scoring matrix (PSSM) encodes the evolutionary conservation information of a protein, the PSSM-derived features have been widely used to predict protein-protein interaction residues in previous studies. In this paper, we developed a novel method to extract feature, called weighted PSSM histogram, from the PSSM of a protein by introducing the concept of histogram in digital image processing field. Based on the extracted weighted PSSM histogram and several traditional features, we trained a random forests prediction model. Experiment results on benchmark datasets demonstrated the efficacy of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Porollo, A., Meller, J.: Prediction-based fingerprints of protein–protein interactions. Proteins Struct. Funct. Bioinf. 66, 630–645 (2007)
Murakami, Y., Mizuguchi, K.: Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics 26, 1841–1848 (2010)
Dhole, K., Singh, G., Pai, P.P., Mondal, S.: Sequence-based prediction of protein–protein interaction sites with L1-logreg classifier. J. Theor. Biol. 348, 47–54 (2014)
Singh, G., Dhole, K., Pai, P.P., Mondal, S.: SPRINGS: prediction of protein-protein interaction sites using artificial neural networks. PeerJ PrePrints 1, 7 (2014)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)
Hwang, H., Pierce, B., Mintseris, J., Janin, J., Weng, Z.: Protein–protein docking benchmark version 3.0. Proteins Struct. Funct. Bioinf. 73, 705–709 (2008)
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Mihel, J., Šikić, M., Tomić, S., Jeren, B., Vlahoviček, K.: PSAIA–protein structure and interaction analyzer. BMC Struct. Biol. 8, 21 (2008)
Li, B.-Q., Feng, K.-Y., Chen, L., Huang, T., Cai, Y.-D.: Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS. PLoS ONE 7, e43927 (2012)
Yu, D., Hu, J., Yang, J., Shen, H., Tang, J.: Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinf. 10, 15 (2013)
Yu, D.J., Hu, J., Huang, Y., Shen, H.B., Qi, Y., Tang, Z.M., Yang, J.Y.: TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J. Comput. Chem. 34, 974–985 (2013)
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L.: BLAST+: architecture and applications. BMC Bioinf. 10, 421 (2009)
Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982)
Lee, B., Richards, F.M.: The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–IN4 (1971)
Joo, K., Lee, S.J., Lee, J.: Sann: solvent accessibility prediction of proteins by nearest neighbor method. Proteins Struct. Funct. Bioinf. 80, 1791–1797 (2012)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Gallet, X., Charloteaux, B., Thomas, A., Brasseur, R.: A fast method to predict protein interaction sites from sequences. J. Mol. Biol. 302, 917–926 (2000)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 405, 442–451 (1975)
Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inf. Assoc. 12, 296–298 (2005)
Acknowledgements
The authors would like to thank the anonymous reviewers for suggestions and comments which helped improve the quality of this paper. This work was supported by the National Natural Science Foundation of China (No. 61373062 and 61233011), the Natural Science Foundation of Jiangsu (No. BK20141403), the Jiangsu Postdoctoral Science Foundation (No. 1201027C), and the China Postdoctoral Science Foundation (No. 2013M530260, 2014T70526).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wei, ZS., Yang, JY., Yu, DJ. (2015). Predicting Protein-Protein Interactions with Weighted PSSM Histogram and Random Forests. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-23862-3_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23861-6
Online ISBN: 978-3-319-23862-3
eBook Packages: Computer ScienceComputer Science (R0)