Skip to main content

Predicting Associations Between Proteins and Multiple Diseases

  • Conference paper
  • First Online:
Foundations of Intelligent Systems (ISMIS 2020)

Abstract

We formulate the task of predicting protein-disease associations as a multi-label classification task. We apply both problem transformation (binary relevance), i.e., local approaches, and algorithm adaptation methods (predictive clustering trees), i.e., global approaches. In both cases, methods for learning individual trees and tree ensembles (random forests) are used. We compare the predictive performance of the local and global approaches on one hand and different feature sets used to represent the proteins on the other.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://kt.ijs.si/martin_breskvar/data/ismis2020.

  2. 2.

    http://source.ijs.si/ktclus/clus-public/.

References

  1. Agrawal, M., Žitnik, M., Leskovec, J.: Large-scale analysis of disease pathways in the human interactome. Pac. Symp. Biocomput. 23, 111–122 (2018)

    Google Scholar 

  2. Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25 (2000)

    Article  Google Scholar 

  3. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1), 105–139 (1999)

    Article  Google Scholar 

  4. Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63. Morgan Kaufmann (1998)

    Google Scholar 

  5. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl-1), D267–D270 (2004)

    Article  Google Scholar 

  6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324

    Article  MATH  Google Scholar 

  7. Carbon, S., et al.: Amigo: online access to ontology and annotation data. Bioinformatics 25(2), 288–289 (2008)

    Article  Google Scholar 

  8. Chatr-Aryamontri, A., et al.: The biogrid interaction database: 2015 update. Nucleic Acids Res. 43(D1), D470–D478 (2014)

    Article  Google Scholar 

  9. Consortium, G.O.: The gene ontology resource: 20 years and still going strong. Nucleic Acids Res. 47(D1), D330–D338 (2018)

    Google Scholar 

  10. Creixell, P., et al.: Pathway and network analysis of cancer genomes. Nat. Methods 12(7), 615 (2015)

    Article  Google Scholar 

  11. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)

    Google Scholar 

  12. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K.: Kegg: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2016)

    Article  Google Scholar 

  13. Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recogn. 46(3), 817–833 (2013). https://doi.org/10.1016/j.patcog.2012.09.023

    Article  Google Scholar 

  14. Menche, J., et al.: Uncovering disease-disease relationships through the incomplete interactome. Science 347(6224), 1257601 (2015)

    Article  Google Scholar 

  15. Piñero, J., et al.: Disgenet: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015 (2015)

    Google Scholar 

  16. Schriml, L.M., et al.: Human disease ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 47(D1), D955–D962 (2018). https://doi.org/10.1093/nar/gky1032

    Article  Google Scholar 

  17. Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Mach. Learn. 73(2), 185 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

We acknowledge the support of the Slovenian Research Agency (grants P2-0103 and N2-0128), the European Commission (grant HBP, The Human Brain Project SGA2), and the ERDF (Interreg Slovenia-Italy project TRAIN). The computational experiments were executed on the computing infrastructure of the Slovenian Grid (SLING) initiative.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Breskvar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Breskvar, M., Džeroski, S. (2020). Predicting Associations Between Proteins and Multiple Diseases. In: Helic, D., Leitner, G., Stettinger, M., Felfernig, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2020. Lecture Notes in Computer Science(), vol 12117. Springer, Cham. https://doi.org/10.1007/978-3-030-59491-6_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59491-6_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59490-9

  • Online ISBN: 978-3-030-59491-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics