Abstract
We formulate the task of predicting protein-disease associations as a multi-label classification task. We apply both problem transformation (binary relevance), i.e., local approaches, and algorithm adaptation methods (predictive clustering trees), i.e., global approaches. In both cases, methods for learning individual trees and tree ensembles (random forests) are used. We compare the predictive performance of the local and global approaches on one hand and different feature sets used to represent the proteins on the other.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, M., Žitnik, M., Leskovec, J.: Large-scale analysis of disease pathways in the human interactome. Pac. Symp. Biocomput. 23, 111–122 (2018)
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25 (2000)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1), 105–139 (1999)
Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63. Morgan Kaufmann (1998)
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl-1), D267–D270 (2004)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324
Carbon, S., et al.: Amigo: online access to ontology and annotation data. Bioinformatics 25(2), 288–289 (2008)
Chatr-Aryamontri, A., et al.: The biogrid interaction database: 2015 update. Nucleic Acids Res. 43(D1), D470–D478 (2014)
Consortium, G.O.: The gene ontology resource: 20 years and still going strong. Nucleic Acids Res. 47(D1), D330–D338 (2018)
Creixell, P., et al.: Pathway and network analysis of cancer genomes. Nat. Methods 12(7), 615 (2015)
Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K.: Kegg: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2016)
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recogn. 46(3), 817–833 (2013). https://doi.org/10.1016/j.patcog.2012.09.023
Menche, J., et al.: Uncovering disease-disease relationships through the incomplete interactome. Science 347(6224), 1257601 (2015)
Piñero, J., et al.: Disgenet: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015 (2015)
Schriml, L.M., et al.: Human disease ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 47(D1), D955–D962 (2018). https://doi.org/10.1093/nar/gky1032
Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Mach. Learn. 73(2), 185 (2008)
Acknowledgements
We acknowledge the support of the Slovenian Research Agency (grants P2-0103 and N2-0128), the European Commission (grant HBP, The Human Brain Project SGA2), and the ERDF (Interreg Slovenia-Italy project TRAIN). The computational experiments were executed on the computing infrastructure of the Slovenian Grid (SLING) initiative.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Breskvar, M., Džeroski, S. (2020). Predicting Associations Between Proteins and Multiple Diseases. In: Helic, D., Leitner, G., Stettinger, M., Felfernig, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2020. Lecture Notes in Computer Science(), vol 12117. Springer, Cham. https://doi.org/10.1007/978-3-030-59491-6_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-59491-6_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59490-9
Online ISBN: 978-3-030-59491-6
eBook Packages: Computer ScienceComputer Science (R0)