Predicting Protein-Protein Interactions with Weighted PSSM Histogram and Random Forests

Wei, Zhi-Sen; Yang, Jing-Yu; Yu, Dong-Jun

doi:10.1007/978-3-319-23862-3_32

Zhi-Sen Wei²¹,
Jing-Yu Yang²¹ &
Dong-Jun Yu^21,22

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9243))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

2771 Accesses
1 Citations

Abstract

The prediction of protein-protein interactions is one of the most important and challenging problems in computational biology. Because position specific scoring matrix (PSSM) encodes the evolutionary conservation information of a protein, the PSSM-derived features have been widely used to predict protein-protein interaction residues in previous studies. In this paper, we developed a novel method to extract feature, called weighted PSSM histogram, from the PSSM of a protein by introducing the concept of histogram in digital image processing field. Based on the extracted weighted PSSM histogram and several traditional features, we trained a random forests prediction model. Experiment results on benchmark datasets demonstrated the efficacy of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Porollo, A., Meller, J.: Prediction-based fingerprints of protein–protein interactions. Proteins Struct. Funct. Bioinf. 66, 630–645 (2007)
Article Google Scholar
Murakami, Y., Mizuguchi, K.: Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics 26, 1841–1848 (2010)
Article Google Scholar
Dhole, K., Singh, G., Pai, P.P., Mondal, S.: Sequence-based prediction of protein–protein interaction sites with L1-logreg classifier. J. Theor. Biol. 348, 47–54 (2014)
Article Google Scholar
Singh, G., Dhole, K., Pai, P.P., Mondal, S.: SPRINGS: prediction of protein-protein interaction sites using artificial neural networks. PeerJ PrePrints 1, 7 (2014)
Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)
Article Google Scholar
Hwang, H., Pierce, B., Mintseris, J., Janin, J., Weng, Z.: Protein–protein docking benchmark version 3.0. Proteins Struct. Funct. Bioinf. 73, 705–709 (2008)
Article Google Scholar
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Mihel, J., Šikić, M., Tomić, S., Jeren, B., Vlahoviček, K.: PSAIA–protein structure and interaction analyzer. BMC Struct. Biol. 8, 21 (2008)
Article Google Scholar
Li, B.-Q., Feng, K.-Y., Chen, L., Huang, T., Cai, Y.-D.: Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS. PLoS ONE 7, e43927 (2012)
Article Google Scholar
Yu, D., Hu, J., Yang, J., Shen, H., Tang, J.: Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinf. 10, 15 (2013)
Google Scholar
Yu, D.J., Hu, J., Huang, Y., Shen, H.B., Qi, Y., Tang, Z.M., Yang, J.Y.: TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J. Comput. Chem. 34, 974–985 (2013)
Article Google Scholar
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L.: BLAST+: architecture and applications. BMC Bioinf. 10, 421 (2009)
Article Google Scholar
Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982)
Article Google Scholar
Lee, B., Richards, F.M.: The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–IN4 (1971)
Article Google Scholar
Joo, K., Lee, S.J., Lee, J.: Sann: solvent accessibility prediction of proteins by nearest neighbor method. Proteins Struct. Funct. Bioinf. 80, 1791–1797 (2012)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Gallet, X., Charloteaux, B., Thomas, A., Brasseur, R.: A fast method to predict protein interaction sites from sequences. J. Mol. Biol. 302, 917–926 (2000)
Article Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)
Article Google Scholar
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 405, 442–451 (1975)
Article Google Scholar
Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inf. Assoc. 12, 296–298 (2005)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for suggestions and comments which helped improve the quality of this paper. This work was supported by the National Natural Science Foundation of China (No. 61373062 and 61233011), the Natural Science Foundation of Jiangsu (No. BK20141403), the Jiangsu Postdoctoral Science Foundation (No. 1201027C), and the China Postdoctoral Science Foundation (No. 2013M530260, 2014T70526).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China
Zhi-Sen Wei, Jing-Yu Yang & Dong-Jun Yu
Changshu Institute, Nanjing University of Science and Technology, Changshu, 215513, People’s Republic of China
Dong-Jun Yu

Authors

Zhi-Sen Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jing-Yu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong-Jun Yu .

Editor information

Editors and Affiliations

Zhejiang University, Hangzhou, China
Xiaofei He
Xidian University, Xi'an, China
Xinbo Gao
Northwestern Polytechnical University, Xi'an, China
Yanning Zhang
Nanjing University, Nanjing, China
Zhi-Hua Zhou
Chinese Academy of Sciences, Beijing, China
Zhi-Yong Liu
Suzhou University of Science and Technology, Suzhou, China
Baochuan Fu
Suzhou University of Science and Technology, Jiangsu, China
Fuyuan Hu
Suzhou University of Science and Technology, Jiangsu, China
Zhancheng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, ZS., Yang, JY., Yu, DJ. (2015). Predicting Protein-Protein Interactions with Weighted PSSM Histogram and Random Forests. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-23862-3_32
Published: 17 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23861-6
Online ISBN: 978-3-319-23862-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics