Abstract
In this work we address the problem of predicting protein-protein interactions. Its solution can give greater insight in the study of complex diseases, like cancer, and provides valuable information in the study of active small molecules for new drugs, limiting the number of molecules to be tested in laboratory. We model the problem as a binary classification task, using a suitable coding of the amino acid sequences. We apply k-Nearest Neighbors classification algorithm to the classes of interacting and noninteracting proteins. Results show that it is possible to achieve high prediction accuracy in cross validation. A case study is analyzed to show it is possible to reconstruct a real network of thousands interacting proteins with high accuracy on standard hardware.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
De Las Rivas, J., de Luis, A.: Interactome data and databases: different types of protein interaction: Conference reviews. Comp. Funct. Genomics 5(2), 173–178 (2004)
Nooren, I.M., Thornton, J.M.: Diversity of protein-protein interactions. EMBO J. 22(14), 3486–3492 (2003)
Grigoriev, A.: On the number of protein-protein interactions in the yeast proteome. Nucleic Acids Research 31, 4157–4161 (2003)
Xenarios, I., Rice, D., Salwinski, L., Baron, M., Marcotte, E., Eisenberg, D.: Dip: the database of interacting proteins. Nucleic Acids Research 28(1), 289–291 (2000)
Walker-Taylor, A., Jones, D.: Computational methods for predicting protein protein interactions. In: Waksman, G. (ed.) Proteomics and protein-protein interactions: biology, chemistry, bioinformatics, and drug design, pp. 89–114. Springer, Heidelberg (2005)
Shoemaker, B., Panchenko, A.: Deciphering protein–protein interactions - part ii. computational methods to predict protein and domain interaction partners. PLoS Computational Biology 3(4), 595–601 (2007)
Shi, T.L., Li, Y.X., Cai, Y.D., Chou, K.C.: Computational methods for protein-protein interaction and their application. Curr. Protein Pept Sci. 6(5), 443–449 (2005)
Pitre, S., Alamgir, M., Green, J., Dumontier, M., Dehne, F., Golshani, A.: Computational Methods for Predicting Protein-Protein Interactions. In: The Adaption of Virtual Man-Computer Interfaces to User Requirements in Dialogs, vol. 110, pp. 247–267. Springer, Berlin (2008)
Mathivanan, S., Periaswamy, B., Gandhi, T.K.B., Kandasamy, K., Suresh, S., Mohmood, R., Ramachandra, Y.L., Pandey, A.: An evaluation of human protein-protein interaction data in the public domain. BMC Bioinformatics 7(Suppl. 5) (2006)
Mewes, H.W., Dietmann, S., Frishman, D., Gregory, R., Mannhaupt, G., Mayer, K.F.X., Münsterkötter, M., Ruepp, A., Spannagl, M., Stümpflen, V., Rattei, T.: Mips: analysis and annotation of genome information in 2007. Nucleic Acids Research 36(Database-Issue), 196–201 (2008)
Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: Biogrid: a general repository for interaction datasets. Nucleic Acids Research 34(Database issue) (January 2006)
Chatr-aryamontri, A., Ceol, A., Palazzi, L.M.M., Nardelli, G., Schneider, M.V.V., Castagnoli, L., Cesareni, G.: Mint: the molecular interaction database. Nucleic Acids Research 35(Database issue), D572–D574 (2007)
Brown, K.R., Jurisica, I.: Online predicted human interaction database. Bioinformatics 21(9), 2076–2082 (2005)
Prasad, K.T.S., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., Telikicherla, D., Raju, R., Shafreen, B., Venugopal, A., Balakrishnan, L., Marimuthu, A., Banerjee, S., Somanathan, D.S., Sebastian, A., Rani, S., Ray, S., Kishore, H.C.J., Kanth, S., Ahmed, M., Kashyap, M.K., Mohmood, R., Ramachandra, Y.L., Krishna, V., Rahiman, A.B., Mohan, S., Ranganathan, P., Ramabadran, S., Chaerkady, R., Pandey, A.: Human protein reference database–2009 update. Nucleic Acids Research 37(Database issue), gkn892+ (2009)
Shen, J., Zhang, J., Luo, X., Zhu, W., Yu, K., Chen, K., Li, Y., Jiang, H.: Predicting protein-protein interactions based only on sequences information. PNAS 104(11), 4337–4341 (2007)
Bock, J.R., Gough, D.A.: Predicting protein–protein interactions from primary structure. Bioinformatics 17(5), 455–460 (2001)
Nanni, L.: Hyperplanes for predicting protein-protein interactions. Neurocomputing 69(1-3), 257–263 (2005)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Guarracino, M., Cuciniello, S., Feminiano, D., Toraldo, G., Pardalos, P.: Current classification algorithms for biomedical applications. Centre de Recherches Mathématiques CRM Proceedings & Lecture Notes of the American Mathematical Society 45(2), 109–126 (2008)
Platt, J.: Fast training of SVMs using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Costantini, S., Facchiano, A.M.: Prediction of the protein structural class by specific peptide frequencies. Biochimie 1-4 (2008)
Hur, A.B., Noble, W.: Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics 7(Suppl. 1) (2006)
Shi, M.G., Xia, J.F., Li, X.L.: Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset. Amino Acids (2009) (online)
Bell, R., Hubbard, A., Chettier, R., Chen, D., Miller, J.P., Kapahi, P., Tarnopolsky, M., Sahasrabuhde, S., Melov, S., Hughes, R.E.: A human protein interaction network shows conservation of aging processes between human and invertebrate species. Plos Genetics 5(3) (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guarracino, M.R., Nebbia, A. (2010). Predicting Protein-Protein Interactions with K-Nearest Neighbors Classification Algorithm. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-14571-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14570-4
Online ISBN: 978-3-642-14571-1
eBook Packages: Computer ScienceComputer Science (R0)