Skip to main content

Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7262))

Abstract

Due to the high false positive rate in the high-throughput experimental methods to discover protein interactions, computational methods are necessary and crucial to complete the interactome expeditiously. However, when building classification models to identify putative protein interactions, compared to the obvious choice of positive samples from truly interacting protein pairs, it is usually very hard to select negative samples, because non-interacting protein pairs refer to those currently without experimental or computational evidence to support a physical interaction or a functional association, which, though, could interact in reality. To tackle this difficulty, instead of using heuristics as in many existing works, in this paper we solve it in a principled way by formulating the protein interaction prediction problem from a new mathematical perspective of view - sparse matrix completion, and propose a novel Nonnegative Matrix Tri-Factorization (NMTF) based matrix completion approach to predict new protein interactions from existing protein interaction networks. Because matrix completion only requires positive samples but not use negative samples, the challenge in existing classification based methods for protein interaction prediction is circumvented. Through using manifold regularization, we further develop our method to integrate different biological data sources, such as protein sequences, gene expressions, protein structure information, etc. Extensive experimental results on Saccharomyces cerevisiae genome show that our new methods outperform related state-of-the-art protein interaction prediction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., et al.: Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)

    Article  Google Scholar 

  2. Ben-Hur, A., Noble, W.: Kernel methods for predicting protein–protein interactions. Bioinformatics 21(suppl. 1), i38 (2005)

    Article  Google Scholar 

  3. Benson, D., Karsch-Mizrachi, I., Lipman, D.: GenBank. Nucleic Acids Res. 34, D16–D20 (2006)

    Article  Google Scholar 

  4. Cai, D., He, X., Wu, X., Han, J.: Non-negative matrix factorization on manifold. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 63–72. IEEE (2008)

    Google Scholar 

  5. Candès, E., Plan, Y.: Matrix completion with noise. Proceedings of the IEEE (2009)

    Google Scholar 

  6. Chen, X., Jeong, J.: Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 25(5), 585 (2009)

    Article  Google Scholar 

  7. Chen, X., Liu, M.: Prediction of protein–protein interactions using random decision forest framework. Bioinformatics 21(24), 4394 (2005)

    Article  Google Scholar 

  8. Chung, F.: Spectral Graph Theory. Amer. Math. Society (1997)

    Google Scholar 

  9. Ding, C., He, X., Simon, H.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proc. SIAM Data Mining Conf., Citeseer, pp. 606–610 (2005)

    Google Scholar 

  10. Ding, C., Li, T., Jordan, M.: Convex and semi-nonnegative matrix factorizations for clustering and low-dimension representation. Lawrence Berkeley National Laboratory, Tech. Rep. LBNL-60428 (2006)

    Google Scholar 

  11. Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 126–135. ACM (2006)

    Google Scholar 

  12. Gu, Q., Zhou, J.: Co-clustering on manifolds. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 359–368. ACM (2009)

    Google Scholar 

  13. Ho, Y., Gruhler, A., Heilbut, A., Bader, G., Moore, L., Adams, S., Millar, A., Taylor, P., Bennett, K., Boutilier, K., et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415(6868), 180–183 (2002)

    Article  Google Scholar 

  14. Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S., Sakaki, Y.: Toward a protein–protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proceedings of the National Academy of Sciences of the United States of America 97(3), 1143 (2000)

    Article  Google Scholar 

  15. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N., Chung, S., Emili, A., Snyder, M., Greenblatt, J., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449 (2003)

    Article  Google Scholar 

  16. Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  17. Leslie, C., Eskin, E., Weston, J., Noble, W.: Mismatch string kernels for SVM protein classification. In: Advances in Neural Information Processing Systems, pp. 1441–1448 (2003)

    Google Scholar 

  18. Luo, D., Ding, C., Huang, H., Li, T.: Non-negative laplacian embedding. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 337–346. IEEE (2009)

    Google Scholar 

  19. Martial, H., Michael, R., Jean-Philippe, V., William, N.: Large-scale prediction of protein-protein interactions from structures. BMC Bioinformatics 11(114) (2010)

    Google Scholar 

  20. Martin, S., Roe, D., Faulon, J.L.: Predicting protein–protein interactions using signature products. Bioinformatics 221(2), 218 (2005)

    Article  Google Scholar 

  21. Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(suppl. 1), i302 (2005)

    Article  Google Scholar 

  22. Qi, Y., Klein-Seetharaman, J., Bar-Joseph, Z.: Random forest similarity for protein-protein interaction prediction from multiple sources. In: Pac. Symp. Biocomput., vol. 10, pp. 531–542 (2005)

    Google Scholar 

  23. Qiu, J., Hue, M., Ben-Hur, A., Vert, J., Noble, W.: A structural alignment kernel for protein structures. Bioinformatics 23(9), 1090 (2007)

    Article  Google Scholar 

  24. Schwikowski, B., Uetz, P., Fields, S.: A network of protein–protein interactions in yeast. Nature Biotechnology 18(12), 1257–1261 (2000)

    Article  Google Scholar 

  25. Shen, J., Zhang, J., Luo, X., Zhu, W., Yu, K., Chen, K., Li, Y., Jiang, H.: Predicting protein–protein interactions based only on sequences information. Proceedings of the National Academy of Sciences 104(11), 4337 (2007)

    Article  Google Scholar 

  26. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE. Trans. on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)

    Article  Google Scholar 

  27. Shoemaker, B.A., Panchenko, A.R.: Deciphering protein-protein interactions. Part I. experimental techniques and databases. PLoS Computational Biology 3(3), 334–337 (2007)

    Article  Google Scholar 

  28. Shoemaker, B., Panchenko, A.: Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Computational Biology 3(4), 595–601 (2007)

    Article  Google Scholar 

  29. Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34(database issue), D535 (2006)

    Article  Google Scholar 

  30. Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nature Biotechnology 21(6), 697–700 (2003)

    Article  Google Scholar 

  31. Wang, H., Huang, H., Ding, C.: Simultaneous Clustering of Multi-Type Relational Data via Symmetric Nonnegative Matrix Tri-factorization. In: The 20th ACM Conference on Information and Knowledge Management. ACM (2011)

    Google Scholar 

  32. Zhang, L., Wong, S., King, O., Roth, F.: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 5(1), 38 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, H., Huang, H., Ding, C., Nie, F. (2012). Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29627-7_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29626-0

  • Online ISBN: 978-3-642-29627-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics