skip to main content
10.1145/3340531.3412776acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

BioKG: A Knowledge Graph for Relational Learning On Biological Data

Authors Info & Claims
Published:19 October 2020Publication History

ABSTRACT

Knowledge graphs became a popular means for modeling complex biological systems where they model the interactions between biological entities and their effects on the biological system. They also provide support for relational learning models which are known to provide highly scalable and accurate predictions of associations between biological entities. Despite the success of the combination of biological knowledge graph and relation learning models in biological predictive tasks, there is a lack of unified biological knowledge graph resources. This forced all current efforts and studies for applying a relational learning model on biological data to compile and build biological knowledge graphs from open biological databases. This process is often performed inconsistently across such efforts, especially in terms of choosing the original resources, aligning identifiers of the different databases, and assessing the quality of included data. To make relational learning on biomedical data more standardised and reproducible, we propose a new biological knowledge graph which provides a compilation of curated relational data from open biological databases in a unified format with common, interlinked identifiers. We also provide a new module for mapping identifiers and labels from different databases which can be used to align our knowledge graph with biological data from other heterogeneous sources. Finally, to illustrate the practical relevance of our work, we provide a set of benchmarks based on the presented data that can be used to train and assess the relational learning models in various tasks related to pathway and drug discovery.

Skip Supplemental Material Section

Supplemental Material

3340531.3412776.mp4

mp4

81.9 MB

References

  1. Joanna S. Amberger, Carol A. Bocchini, François Schiettecatte, Alan F. Scott, and Ada Hamosh. 2015. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Research , Vol. 43 (2015), D789 -- D798.Google ScholarGoogle ScholarCross RefCross Ref
  2. Amos Bairoch. 2018. The Cellosaurus, a Cell-Line Knowledge Resource. Journal of biomolecular techniques : JBT , Vol. 29 2 (2018), 25--38.Google ScholarGoogle ScholarCross RefCross Ref
  3. François Belleau, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault, and Jean Morissette. 2008. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal of biomedical informatics , Vol. 41 5 (2008), 706--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Antoine Bordes, Nicolas Usunier, Alberto Garc'i a-Durá n, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787--2795.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gene Ontology Consortium. 2005. The Gene Ontology (GO) project in 2006. Nucleic Acids Research , Vol. 34 (2005), D322 -- D326.Google ScholarGoogle ScholarCross RefCross Ref
  6. The UniProt Consortium. 2010. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Research , Vol. 38 (2010), D142 -- D148.Google ScholarGoogle ScholarCross RefCross Ref
  7. The UniProt Consortium. 2019. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research , Vol. 47 (2019), D506 -- D515.Google ScholarGoogle ScholarCross RefCross Ref
  8. David Croft and Gavin O'Kelly et. al. 2011. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Research , Vol. 39 (2011), D691 -- D697.Google ScholarGoogle ScholarCross RefCross Ref
  9. Nikolai Hecker, Jessica Ahmed, Joachim von Eichborn, Mathias Dunkel, Karel Macha, Andreas Eckert, Michael K. Gilson, Philip E. Bourne, and Robert Preissner. 2012. SuperTarget goes quantitative: update on drug--target interactions. Nucleic Acids Research , Vol. 40 (2012), D1113 -- D1117.Google ScholarGoogle ScholarCross RefCross Ref
  10. Micheal Hewett, Diane E. Oliver, Daniel L. Rubin, Katrina L. Easton, Joshua M. Stuart, Russ B. Altman, and Teri E. Klein. 2002. PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic acids research, Vol. 30 1 (2002), 163--5.Google ScholarGoogle Scholar
  11. Maruan Hijazi, Ryan Smith, Vinothini Rajeeve, Conrad Bessant, and Pedro R. Cutillas. 2020. Reconstructing kinase network topologies from phosphoproteomics data reveals cancer-associated rewiring. Nature Biotechnology, Vol. 38 (2020), 493 -- 502.Google ScholarGoogle ScholarCross RefCross Ref
  12. Heiko Horn, Erwin Schoof, Jinho Kim, Xavier Robin, Martin L. Miller, Francesca Diella, Anita Palma, Gianni Cesareni, Lars Juhl Jensen, and Rune Linding. 2014. KinomeXplorer: an integrated platform for kinome biology studies. Nature Methods, Vol. 11 (2014), 603--604.Google ScholarGoogle ScholarCross RefCross Ref
  13. Peter V. Hornbeck, Jon M. Kornhauser, Sasha Tkachev, Bin Zhang, Elzbieta Skrzypek, Beth Murray, Vaughan Latham, and Michael Sullivan. 2012. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Research , Vol. 40 (2012), D261 -- D270.Google ScholarGoogle ScholarCross RefCross Ref
  14. Minoru Kanehisa, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and Mao Tanabe. 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Research , Vol. 44 (2016), D457 -- D462.Google ScholarGoogle ScholarCross RefCross Ref
  15. Craig Knox, Vivian Law, Timothy Jewison, Philip Liu, Son Ly, Alex Frolkis, Allison Pon, Kelly Banco, Christine Mak, Vanessa Neveu, Yannick Djoumbou, Roman Eisner, Anchi Guo, and David Scott Wishart. 2011. DrugBank 3.0: a comprehensive resource for 'Omics' research on drugs. Nucleic Acids Research , Vol. 39 (2011), D1035 -- D1041.Google ScholarGoogle ScholarCross RefCross Ref
  16. Michael Kuhn, Ivica Letunic, Lars Juhl Jensen, and Peer Bork. 2016. The SIDER database of drugs and side effects. Nucleic Acids Research , Vol. 44 (2016), D1075 -- D1079.Google ScholarGoogle ScholarCross RefCross Ref
  17. Xin Liu, Feng Zhu, Xiaohua Ma, Lin Tao, Jingxian Zhang, Shengyong Yang, Yuquan Wei, and Y. Z. Chen. 2011. The Therapeutic Target Database: an internet resource for the primary targets of approved, clinical trial and experimental drugs. Expert opinion on therapeutic targets , Vol. 15 8 (2011), 903--12.Google ScholarGoogle Scholar
  18. Farzaneh Mahdisoltani, Joanna Biega, and Fabian M. Suchanek. 2015. YAGO3: A Knowledge Base from Multilingual Wikipedias. In CIDR. www.cidrdb.org.Google ScholarGoogle Scholar
  19. Carolyn J. Mattingly, Glenn T. Colby, John N. Forrest, and James L. Boyer. 2003. The Comparative Toxicogenomics Database (CTD). Environmental Health Perspectives , Vol. 111 (2003), 793 -- 795.Google ScholarGoogle ScholarCross RefCross Ref
  20. George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM, Vol. 38, 11 (1995), 39--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alex L. Mitchell and Terri K. Attwood et. al. 2019. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Research , Vol. 47 (2019), D351 -- D360.Google ScholarGoogle ScholarCross RefCross Ref
  22. Sameh K. Mohamed. 2020. Predicting tissue-specific protein functions using multi-part tensor decomposition. Information Sciences, Vol. 508 (2020), 343--357.Google ScholarGoogle ScholarCross RefCross Ref
  23. Sameh K Mohamed and Aayah Nounu. 2020. Predicting The Effects of Chemical-Protein Interactions On Proteins Using Tensor Factorisation. AMIA Summits on Translational Science Proceedings, Vol. 2020 (2020), 430.Google ScholarGoogle Scholar
  24. Sameh K Mohamed, Aayah Nounu, and V'i t Nová cek. 2020 a. Biological applications of knowledge graph embedding models. Briefings in Bioinformatics (02 2020). https://doi.org/10.1093/bib/bbaa012 bbaa012.Google ScholarGoogle Scholar
  25. Sameh K. Mohamed and V'i t Nová cek. 2019. Link Prediction Using Multi Part Embeddings. In ESWC (Lecture Notes in Computer Science, Vol. 11503). Springer, 240--254.Google ScholarGoogle Scholar
  26. Sameh K. Mohamed, V'i t Nová cek, and Aayah Nounu. 2020 b. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics, Vol. 36, 2 (2020), 603--610.Google ScholarGoogle ScholarCross RefCross Ref
  27. Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2016. A Review of Relational Machine Learning for Knowledge Graphs. Proc. IEEE, Vol. 104 (2016), 11--33.Google ScholarGoogle ScholarCross RefCross Ref
  28. John C. Obenauer, Lewis C. Cantley, and Michael B. Yaffe. 2003. Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic acids research, Vol. 31 13 (2003), 3635--41.Google ScholarGoogle Scholar
  29. Rawan S. Olayan, Haitham Ashoor, and Vladimir B. Bajic. 2018. DDR: efficient computational method to predict drug--target interactions using graph mining and machine learning approaches. Bioinformatics, Vol. 34 (2018), 1164 -- 1173.Google ScholarGoogle ScholarCross RefCross Ref
  30. Sandra E. Orchard, Mais G. Ammari, and Bruno Aranda et. al. 2014. The MIntAct project?IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Research , Vol. 42 (2014), D358 -- D363.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jiangning Song, Huilin Wang, Jiawei Wang, André Leier, Tatiana T. Marquez-Lago, Bingjiao Yang, Ziding Zhang, Tatsuya Akutsu, Geoffrey I. Webb, and Roger J. Daly. 2017. PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Scientific Reports, Vol. 7 (2017).Google ScholarGoogle Scholar
  32. Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers. 2006. BioGRID: a general repository for interaction datasets. Nucleic Acids Research , Vol. 34 (2006), D535 -- D539.Google ScholarGoogle ScholarCross RefCross Ref
  33. Damian Szklarczyk, Andrea Franceschini, Michael Kuhn, Milan Simonovic, Alexander Roth, Pablo Mínguez, Tobias Doerks, Manuel Stark, Jean Muller, Peer Bork, Lars Juhl Jensen, and Christian von Mering. 2011. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Research , Vol. 39 (2011), D561 -- D568.Google ScholarGoogle ScholarCross RefCross Ref
  34. Nicholas P. Tatonetti, Patrick Ye, Roxana Daneshjou, and Russ B. Altman. 2012. Data-driven prediction of drug effects and interactions. Science translational medicine , Vol. 4 125 (2012), 125ra31.Google ScholarGoogle Scholar
  35. Thé o Trouillon, Johannes Welbl, Sebastian Riedel, É ric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. In ICML (JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 2071--2080.Google ScholarGoogle Scholar
  36. Mathias Uhlén, Per Oksvold, Linn Fagerberg, Emma Lundberg, Kalle Jonasson, Mattias Forsberg, Martin Zwahlen, Caroline Kampf, Kenneth Wester, Sophia Hober, Henrik Wernérus, Lisa Björling, and Frederik Pontén. 2010. Towards a knowledge-based Human Protein Atlas. Nature Biotechnology, Vol. 28 (2010), 1248--1250.Google ScholarGoogle ScholarCross RefCross Ref
  37. Christian von Mering, Martijn A. Huynen, Daniel Jaeggi, Steffen Schmidt, Peer Bork, and Berend Snel. 2003. STRING: a database of predicted functional associations between proteins. Nucleic acids research, Vol. 31 1 (2003), 258--61.Google ScholarGoogle Scholar
  38. David S. Wishart, Craig Knox, An Chi Guo, Dean Cheng, Savita Shrivastava, Dan Tzur, Bijaya Gautam, and Murtaza Hassanali. 2008. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Research , Vol. 36 (2008), D901--D906.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yoshihiro Yamanishi, Michihiro Araki, Alex Gutteridge, Wataru Honda, and Minoru Kanehisa. 2008. Prediction of drug--target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, Vol. 24 (2008), i232 -- i240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Marinka Zitnik, Monica Agrawal, and Jure Leskovec. 2018. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, Vol. 34 (2018), i457 -- i466.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. BioKG: A Knowledge Graph for Relational Learning On Biological Data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
          October 2020
          3619 pages
          ISBN:9781450368599
          DOI:10.1145/3340531

          Copyright © 2020 ACM

          Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 October 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader