research-article

BioKG: A Knowledge Graph for Relational Learning On Biological Data

Authors:
Brian Walsh

NUI Galway Insight Centre for Data analytics, Galway, Ireland

NUI Galway Insight Centre for Data analytics, Galway, Ireland
View Profile

,
Sameh K. Mohamed

NUI Galway Insight Centre for Data analytics, Galway, Ireland

NUI Galway Insight Centre for Data analytics, Galway, Ireland
View Profile

,
Vít Nováček

NUI Galway Insight Centre for Data analytics, Galway, Ireland

NUI Galway Insight Centre for Data analytics, Galway, Ireland
View Profile

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020Pages 3173–3180https://doi.org/10.1145/3340531.3412776

Published:19 October 2020Publication History

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 3173–3180

ABSTRACT

Knowledge graphs became a popular means for modeling complex biological systems where they model the interactions between biological entities and their effects on the biological system. They also provide support for relational learning models which are known to provide highly scalable and accurate predictions of associations between biological entities. Despite the success of the combination of biological knowledge graph and relation learning models in biological predictive tasks, there is a lack of unified biological knowledge graph resources. This forced all current efforts and studies for applying a relational learning model on biological data to compile and build biological knowledge graphs from open biological databases. This process is often performed inconsistently across such efforts, especially in terms of choosing the original resources, aligning identifiers of the different databases, and assessing the quality of included data. To make relational learning on biomedical data more standardised and reproducible, we propose a new biological knowledge graph which provides a compilation of curated relational data from open biological databases in a unified format with common, interlinked identifiers. We also provide a new module for mapping identifiers and labels from different databases which can be used to align our knowledge graph with biological data from other heterogeneous sources. Finally, to illustrate the practical relevance of our work, we provide a set of benchmarks based on the presented data that can be used to train and assess the relational learning models in various tasks related to pathway and drug discovery.

Supplemental Material

3340531.3412776.mp4

mp4

81.9 MB

Download

References

Joanna S. Amberger, Carol A. Bocchini, François Schiettecatte, Alan F. Scott, and Ada Hamosh. 2015. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Research , Vol. 43 (2015), D789 -- D798.Google ScholarCross Ref
Amos Bairoch. 2018. The Cellosaurus, a Cell-Line Knowledge Resource. Journal of biomolecular techniques : JBT , Vol. 29 2 (2018), 25--38.Google ScholarCross Ref
François Belleau, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault, and Jean Morissette. 2008. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal of biomedical informatics , Vol. 41 5 (2008), 706--16.Google ScholarDigital Library
Antoine Bordes, Nicolas Usunier, Alberto Garc'i a-Durá n, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787--2795.Google ScholarDigital Library
Gene Ontology Consortium. 2005. The Gene Ontology (GO) project in 2006. Nucleic Acids Research , Vol. 34 (2005), D322 -- D326.Google ScholarCross Ref
The UniProt Consortium. 2010. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Research , Vol. 38 (2010), D142 -- D148.Google ScholarCross Ref
The UniProt Consortium. 2019. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research , Vol. 47 (2019), D506 -- D515.Google ScholarCross Ref
David Croft and Gavin O'Kelly et. al. 2011. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Research , Vol. 39 (2011), D691 -- D697.Google ScholarCross Ref
Nikolai Hecker, Jessica Ahmed, Joachim von Eichborn, Mathias Dunkel, Karel Macha, Andreas Eckert, Michael K. Gilson, Philip E. Bourne, and Robert Preissner. 2012. SuperTarget goes quantitative: update on drug--target interactions. Nucleic Acids Research , Vol. 40 (2012), D1113 -- D1117.Google ScholarCross Ref
Micheal Hewett, Diane E. Oliver, Daniel L. Rubin, Katrina L. Easton, Joshua M. Stuart, Russ B. Altman, and Teri E. Klein. 2002. PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic acids research, Vol. 30 1 (2002), 163--5.Google Scholar
Maruan Hijazi, Ryan Smith, Vinothini Rajeeve, Conrad Bessant, and Pedro R. Cutillas. 2020. Reconstructing kinase network topologies from phosphoproteomics data reveals cancer-associated rewiring. Nature Biotechnology, Vol. 38 (2020), 493 -- 502.Google ScholarCross Ref
Heiko Horn, Erwin Schoof, Jinho Kim, Xavier Robin, Martin L. Miller, Francesca Diella, Anita Palma, Gianni Cesareni, Lars Juhl Jensen, and Rune Linding. 2014. KinomeXplorer: an integrated platform for kinome biology studies. Nature Methods, Vol. 11 (2014), 603--604.Google ScholarCross Ref
Peter V. Hornbeck, Jon M. Kornhauser, Sasha Tkachev, Bin Zhang, Elzbieta Skrzypek, Beth Murray, Vaughan Latham, and Michael Sullivan. 2012. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Research , Vol. 40 (2012), D261 -- D270.Google ScholarCross Ref
Minoru Kanehisa, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and Mao Tanabe. 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Research , Vol. 44 (2016), D457 -- D462.Google ScholarCross Ref
Craig Knox, Vivian Law, Timothy Jewison, Philip Liu, Son Ly, Alex Frolkis, Allison Pon, Kelly Banco, Christine Mak, Vanessa Neveu, Yannick Djoumbou, Roman Eisner, Anchi Guo, and David Scott Wishart. 2011. DrugBank 3.0: a comprehensive resource for 'Omics' research on drugs. Nucleic Acids Research , Vol. 39 (2011), D1035 -- D1041.Google ScholarCross Ref
Michael Kuhn, Ivica Letunic, Lars Juhl Jensen, and Peer Bork. 2016. The SIDER database of drugs and side effects. Nucleic Acids Research , Vol. 44 (2016), D1075 -- D1079.Google ScholarCross Ref
Xin Liu, Feng Zhu, Xiaohua Ma, Lin Tao, Jingxian Zhang, Shengyong Yang, Yuquan Wei, and Y. Z. Chen. 2011. The Therapeutic Target Database: an internet resource for the primary targets of approved, clinical trial and experimental drugs. Expert opinion on therapeutic targets , Vol. 15 8 (2011), 903--12.Google Scholar
Farzaneh Mahdisoltani, Joanna Biega, and Fabian M. Suchanek. 2015. YAGO3: A Knowledge Base from Multilingual Wikipedias. In CIDR. www.cidrdb.org.Google Scholar
Carolyn J. Mattingly, Glenn T. Colby, John N. Forrest, and James L. Boyer. 2003. The Comparative Toxicogenomics Database (CTD). Environmental Health Perspectives , Vol. 111 (2003), 793 -- 795.Google ScholarCross Ref
George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM, Vol. 38, 11 (1995), 39--41.Google ScholarDigital Library
Alex L. Mitchell and Terri K. Attwood et. al. 2019. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Research , Vol. 47 (2019), D351 -- D360.Google ScholarCross Ref
Sameh K. Mohamed. 2020. Predicting tissue-specific protein functions using multi-part tensor decomposition. Information Sciences, Vol. 508 (2020), 343--357.Google ScholarCross Ref
Sameh K Mohamed and Aayah Nounu. 2020. Predicting The Effects of Chemical-Protein Interactions On Proteins Using Tensor Factorisation. AMIA Summits on Translational Science Proceedings, Vol. 2020 (2020), 430.Google Scholar
Sameh K Mohamed, Aayah Nounu, and V'i t Nová cek. 2020 a. Biological applications of knowledge graph embedding models. Briefings in Bioinformatics (02 2020). https://doi.org/10.1093/bib/bbaa012 bbaa012.Google Scholar
Sameh K. Mohamed and V'i t Nová cek. 2019. Link Prediction Using Multi Part Embeddings. In ESWC (Lecture Notes in Computer Science, Vol. 11503). Springer, 240--254.Google Scholar
Sameh K. Mohamed, V'i t Nová cek, and Aayah Nounu. 2020 b. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics, Vol. 36, 2 (2020), 603--610.Google ScholarCross Ref
Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2016. A Review of Relational Machine Learning for Knowledge Graphs. Proc. IEEE, Vol. 104 (2016), 11--33.Google ScholarCross Ref
John C. Obenauer, Lewis C. Cantley, and Michael B. Yaffe. 2003. Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic acids research, Vol. 31 13 (2003), 3635--41.Google Scholar
Rawan S. Olayan, Haitham Ashoor, and Vladimir B. Bajic. 2018. DDR: efficient computational method to predict drug--target interactions using graph mining and machine learning approaches. Bioinformatics, Vol. 34 (2018), 1164 -- 1173.Google ScholarCross Ref
Sandra E. Orchard, Mais G. Ammari, and Bruno Aranda et. al. 2014. The MIntAct project?IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Research , Vol. 42 (2014), D358 -- D363.Google ScholarCross Ref
Jiangning Song, Huilin Wang, Jiawei Wang, André Leier, Tatiana T. Marquez-Lago, Bingjiao Yang, Ziding Zhang, Tatsuya Akutsu, Geoffrey I. Webb, and Roger J. Daly. 2017. PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Scientific Reports, Vol. 7 (2017).Google Scholar
Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers. 2006. BioGRID: a general repository for interaction datasets. Nucleic Acids Research , Vol. 34 (2006), D535 -- D539.Google ScholarCross Ref
Damian Szklarczyk, Andrea Franceschini, Michael Kuhn, Milan Simonovic, Alexander Roth, Pablo Mínguez, Tobias Doerks, Manuel Stark, Jean Muller, Peer Bork, Lars Juhl Jensen, and Christian von Mering. 2011. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Research , Vol. 39 (2011), D561 -- D568.Google ScholarCross Ref
Nicholas P. Tatonetti, Patrick Ye, Roxana Daneshjou, and Russ B. Altman. 2012. Data-driven prediction of drug effects and interactions. Science translational medicine , Vol. 4 125 (2012), 125ra31.Google Scholar
Thé o Trouillon, Johannes Welbl, Sebastian Riedel, É ric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. In ICML (JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 2071--2080.Google Scholar
Mathias Uhlén, Per Oksvold, Linn Fagerberg, Emma Lundberg, Kalle Jonasson, Mattias Forsberg, Martin Zwahlen, Caroline Kampf, Kenneth Wester, Sophia Hober, Henrik Wernérus, Lisa Björling, and Frederik Pontén. 2010. Towards a knowledge-based Human Protein Atlas. Nature Biotechnology, Vol. 28 (2010), 1248--1250.Google ScholarCross Ref
Christian von Mering, Martijn A. Huynen, Daniel Jaeggi, Steffen Schmidt, Peer Bork, and Berend Snel. 2003. STRING: a database of predicted functional associations between proteins. Nucleic acids research, Vol. 31 1 (2003), 258--61.Google Scholar
David S. Wishart, Craig Knox, An Chi Guo, Dean Cheng, Savita Shrivastava, Dan Tzur, Bijaya Gautam, and Murtaza Hassanali. 2008. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Research , Vol. 36 (2008), D901--D906.Google ScholarCross Ref
Yoshihiro Yamanishi, Michihiro Araki, Alex Gutteridge, Wataru Honda, and Minoru Kanehisa. 2008. Prediction of drug--target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, Vol. 24 (2008), i232 -- i240.Google ScholarDigital Library
Marinka Zitnik, Monica Agrawal, and Jure Leskovec. 2018. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, Vol. 34 (2018), i457 -- i466.Google ScholarCross Ref

Index Terms

BioKG: A Knowledge Graph for Relational Learning On Biological Data
1. Applied computing
  1. Life and medical sciences
    1. Bioinformatics
    2. Computational biology
      1. Biological networks
2. Information systems
  1. Data management systems
    1. Information integration
      1. Extraction, transformation and loading

Recommendations

On the reproducibility of results of pathway analysis in genome-wide expression studies of colorectal cancers

One of the major problems in genomics and medicine is the identification of gene networks and pathways deregulated in complex and polygenic diseases, like cancer. In this paper, we address the problem of assessing the variability of results of pathways ...
Read More
Research Article: Bioinformatic analysis of molecular network of glucosinolate biosynthesis

Glucosinolates constitute a major group of secondary metabolites in Arabidopsis, which play an important role in plant interaction with pathogens and insects. Advances in glucosinolate research have defined the biosynthetic pathways. However, cross-talk ...
Read More
Gene interaction - An evolutionary biclustering approach

DNA Microarray experiments form a powerful tool for studying gene expression patterns, in large scale. Sharing of the regulatory mechanism among genes, in an organism, is predominantly responsible for their co-expression. Biclustering aims at finding a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland
Copyright © 2020 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bioinformatics
biological knowledge graphs
knowledge graph embedding
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 1,023
  Total Downloads
- Downloads (Last 12 months)316
- Downloads (Last 6 weeks)43
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BioKG: A Knowledge Graph for Relational Learning On Biological Data

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

On the reproducibility of results of pathway analysis in genome-wide expression studies of colorectal cancers

Research Article: Bioinformatic analysis of molecular network of glucosinolate biosynthesis

Gene interaction - An evolutionary biclustering approach