Skip to main content

Multitask Matrix Completion for Learning Protein Interactions Across Diseases

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9649))

Abstract

Disease causing pathogens such as viruses, introduce their proteins into the host cells where they interact with the host’s proteins enabling the virus to replicate inside the host. These interactions between pathogen and host proteins are key to understanding infectious diseases. Often multiple diseases involve phylogenetically related or biologically similar pathogens. Here we present a multitask learning method to jointly model interactions between human proteins and three different, but related viruses: Hepatitis C, Ebola virus and Influenza A. Our multitask matrix completion based model uses a shared low-rank structure in addition to a task-specific sparse structure to incorporate the various interactions. We obtain upto a 39 % improvement in predictive performance over prior state-of-the-art models. We show how our model’s parameters can be interpreted to reveal both general and specific interaction-relevant characteristics of the viruses. Our code, data and supplement is available at: http://www.cs.cmu.edu/~mkshirsa/bsl_mtl.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The dimensions being different does not influence the method or the optimization in any way.

  2. 2.

    Since we use data from several strains for each task, the PPI data contains some interactions that are interologs. Please see the supplementary Sect. S4 for details.

  3. 3.

    For details of these classes, please refer to the supplementary or the original paper.

References

  1. Abernethy, J., Bach, F., Evgeniou, T., Vert, J.P.: A new approach to collaborative filtering: operator estimation with spectral regularization. J. Mach. Learn. Res. (JMLR) 10, 803–826 (2009)

    MATH  Google Scholar 

  2. Candes, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  3. Chen, J., Liu, J., Ye, J.: Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data (TKDD) 5(4), 22 (2012)

    Google Scholar 

  4. Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)

    Google Scholar 

  5. Dyer, M.D., Murali, T.M., Sobral, B.W.: Computational prediction of host-pathogen protein-protein interactions. Bioinformatics 23(13), i159–166 (2007)

    Article  Google Scholar 

  6. Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: ACM SIGKDD (2004)

    Google Scholar 

  7. Hornbeck, P.V., Zhang, B., Murray, B., Kornhauser, J.M., Latham, V., Skrzypek, E.: Phosphositeplus, 2014: mutations, ptms and recalibrations. Nucleic Acids Res. 43(D1), D512–D520 (2015)

    Article  Google Scholar 

  8. Jain, P., Dhillon, I.S.: Provable inductive matrix completion (2013). arXiv:1306.0626

  9. Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J.: Techniques to cope with missing data in host-pathogen protein interaction prediction. Bioinformatics 28(18), i466–i472 (2012)

    Article  Google Scholar 

  10. Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J.: Multi-task learning for host-pathogen protein interactions. Bioinformatics 29(13), i217–i226 (2013)

    Article  Google Scholar 

  11. Nanbo, A., Imai, M., Watanabe, S., et al.: Ebolavirus is internalized into host cells via macropinocytosis in a viral glycoprotein-dependent manner. PLoS Pathog. 6(9), e1001121 (2010)

    Article  Google Scholar 

  12. Natarajan, N., Dhillon, I.S.: Inductive matrix completion for predicting genedisease associations. Bioinformatics 30(12), i60–i68 (2014)

    Article  Google Scholar 

  13. Qi, Y., et al.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 63(3), 490–500 (2006)

    Article  Google Scholar 

  14. Qi, Y., Tastan, O., Carbonell, J.G., Klein-Seetharaman, J., Weston, J.: Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins. Bioinformatics 6(18), i645–i652 (2010)

    Article  Google Scholar 

  15. Shen, J., et al.: Predicting protein-protein interactions based only on sequences information. PNAS 104, 4337–4341 (2007)

    Article  Google Scholar 

  16. Singh, R., Xu, J., Berger, B.: Struct2net: integrating structure into protein-protein interaction prediction. Pac. Symp. Biocomput. 11, 403–414 (2006)

    Google Scholar 

  17. Tastan, O., et al.: Prediction of interactions between HIV-1 and human proteins by information integration. Pac. Symp. Biocomput. 14, 516–527 (2009)

    Google Scholar 

  18. Tekir, S.D., Ali, S., Tunahan, C., Kutlu, O.U.: Infection strategies of bacterial and viral pathogens through pathogen-host protein protein interactions. Front. Microbio. Immunol. 3, 46 (2012)

    Google Scholar 

  19. Thomsen, M.C.F., Nielsen, M.: Seq2logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 40(W1), W281–W287 (2012)

    Article  Google Scholar 

  20. Widmer, C., Leiva, J., Altun, Y., Rätsch, G.: Leveraging sequence classification by taxonomy-based multitask learning. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 522–534. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  21. Xu, Q., Xiang, W.E., Yang, Q.: Protein-protein interaction prediction via collective matrix factorization. In: International Conference on Bioinformatics and Biomedicine (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meghana Kshirsagar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kshirsagar, M., Carbonell, J.G., Klein-Seetharaman, J., Murugesan, K. (2016). Multitask Matrix Completion for Learning Protein Interactions Across Diseases. In: Singh, M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science(), vol 9649. Springer, Cham. https://doi.org/10.1007/978-3-319-31957-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31957-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31956-8

  • Online ISBN: 978-3-319-31957-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics