Abstract
Biomedical data are complex and heterogeneous. An ample reliable quantity of data is important for understanding and exploring the domain. The work aims to integrate biomedical data from various heterogeneous sources like dictionaries or corpus and amalgamate them into a uniform format for easier access by the end-user like biologist, pharmacist, and data scientist. The proposed integrated biomedical knowledge base, BIOINTMED, has 11,299, 12,981, 4428, 61,491, 48,663, and 13,146 unique entities for drugs, diseases, targets, genes, biomedical pathways, and adverse events, respectively. The uniform aggregated collection is also explored to study the interaction among these entity pairs. Finally, a complete statistical analysis of the consolidated biomedical entities is provided.
Similar content being viewed by others
References
Pinero J, Bravo A, Queralt-Rosinach N, Gutierrez-Sacristan A, Deu-Pons J, Centeno E, Garcia-Garcia J, Sanz F, Furlong LI (2016) DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res 45(D1):D833–D839
Butler P, Evering W, Dybdal N, Donzanti B, Bergeron J, Casey M, Abernethy D, Minasian L, Ratcliffe S (2017) Predictox: a systems pharmacology project to examine cardiotoxicity associated with tyrosine kinase inhibitors, vol 88. https://doi.org/10.1016/j.vascn.2017.09.141, http://www.sciencedirect.com/science/article/pii/S1056871917303684
Shadia Z, Sirarat S, Darrell RA (2017) Use of biomedical ontologies for integration of biological knowledge for learning and prediction of adverse drug reactions. Gene Regul Syst Biol 11:1177625017696075
Hameurlain A, Küng J, Wagner R, Böhm C, Eder J, Plant C (2011) Transactions on large-scale data- and knowledge-centered systems. In: Special issue on database systems for biomedical applications
Federer C, Yoo M, Tan AC (2016) Big data mining and adverse event pattern analysis in clinical drug trials. ASSAY Drug Dev Technol 14(10):557–566
Payne OPR (2012) Chapter 1: Biomedical knowledge integration. PLOS Comput Biol 8:12
Bai H, Pan Y, Guo C, Zhao X, Shen B, Wang X, Liu Z, Cheng Y, Qin W, Qian X (2017) Synthesis of hydrazide-functionalized hydrophilic polymer hybrid graphene oxide for highly efficient n-glycopeptide enrichment and identification by mass spectrometry. Talanta 171:124–131. https://doi.org/10.1016/j.talanta.2017.04.076http://www.sciencedirect.com/science/article/pii/S0039914017305027
Mate S, Kopcke F, Toddenroth D, Martin M, Prokosch HU, Burkle T, Ganslandt T (2015) Ontology-based data integration between clinical and research systems. PLoS One 10
Sidhu SA, Kennedy PJ, Simeon J (2007) Knowledge discovery in biomedical data facilitated by domain ontologies. In: Knowledge discovery and data mining: challenges and realities, pp 189–201. https://doi.org/10.4018/978-1-59904-252-7.ch010
Maree M, Belkhatir M (2015) Addressing semantic heterogeneity through multiple knowledge base assisted merging of domain-specific ontologies. Knowl Based Syst 73:199– 211. https://doi.org/10.1016/j.knosys.2014.10.001http://www.sciencedirect.com/science/article/pii/S0950705114003682
Bousquet C, Souvignet J, Sadou E, Jaulent MC, Declerck G (2019) Ontological and non-ontological resources for associating medical dictionary for regulatory activities terms to snomed clinical terms with semantic properties. Front Pharmacol 10, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6747929/
Fung KW, Bodenreider O (2019) Knowledge representation and ontologies. Springer International Publishing, Cham, pp 313–339
Guo AC, Gautam B, Knox C, Tzur D, Cheng D, Hassanali M, Shrivastava S, Wishart DS (2007) DrugBank: a knowledgebase for drugs, drug actions and drug targets, vol 36
Wishart D, Knox C, Guo A, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34:D668–D672
(2018). Drug Bank. https://www.drugbank.ca/, accessed: 2018-12-26
Consortium TU (2014) UniProt: a hub for protein information. Nucleic Acids Res 43 (D1):D204–D212
(2019). UniProt. ftp://ftp.uniprot.org/pub/databases/uniprot/, accessed: 2019-02-04
Gindulyte A, Shoemaker BA, Yu B, Fu G, He J, Zhang J, Chen J, Wang J, Han L, Thiessen PA, He S, Bryant SH, Kim S, Bolton EE (2015) PubChem substance and compound databases, vol 44
Li Q, Cheng T, Wang Y, Bryant SH (2010) Drug Discov Today 15(1):1052–1057
(2019). PubChemPy. https://github.com/mcs07/PubChemPy, accessed: 2019-02-04
(2019). PubChem. https://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi, accessed: 2019-02-04
Kanehisa M, Goto S (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes, vol 28
(2019). Kyoto encyclopedia of genes and genomes. FTP:// ftp.genome.jp/pub/kegg/medicus/, accessed: 2019-03-10
(2019). KGML (KEGG markup language). https://www.kegg.jp/kegg/xml/, accessed: 2019-03-20
Frolkis A, Guo AC, Gautam B, Knox C, Hau DD, Wishart DS, Lim E, Xia J, Liu P, Shrivastava S, Ly S, Jewison T, Law V, Liang Y (2009) SMPDB: the small molecule pathway database, vol 38
(2019). Small Molecule Pathway Database. http://smpdb.ca/downloads, accessed: 2019-03-10
Kuhn M, Letunic I, Jensen LJ, Bork P (2015) The SIDER database of drugs and side effects, vol 44
(2019). STITCH. http://stitch.embl.de/cgi/download.pl, accessed: 2019-01-10
(2019). SIDER. http://sideeffects.embl.de/, accessed: 2019-02-19
(2019a). Offsides and twosides. http://tatonettilab.org/offsides/, accessed: 2019-03-20
(2019b). OFFSIDES. http://lmmd.ecust.edu.cn/online_services/metaadedb/, accessed: 2019-03-10
Hirsch J, Nicola G, McGinty G, Liu R, Barr R, Chittle M, Manchikanti L (2016) ICD-10: History and Context, vol 37
Mattingly CJ, Colby GT, Forrest JN, Boyer JL (2003) The comparative Toxicogenomics Database (CTD), vol 111
(2019). Comparative Toxicogenomics Database. http://ctdbase.org/downloads/, accessed: 2019-01-10
Maglott D, Jim O, Pruitt KD, Tatusova T (2011) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 39:D52–D57
(2019). NCBI Gene Databasem https://www.ncbi.nlm.nih.gov/home/genes/, accessed: 2019-01-10
(2019). Clinical Trials, https://clinicaltrials.gov/, accessed: 2017-04-19
(2019). PySpark, https://spark.apache.org/docs/2.3.1/api/python/pyspark.html, accessed: 2019-02-04
(2019). Spark-XML, https://github.com/databricks/spark-xml, accessed: 2019-02-04
Funding
This work is financially supported by the project “Effective Drug Repurposing through literature and patent mining, data integration and development of systems pharmacology platform” sponsored by MHRD, India and Excelra Knowledge Solutions, Hyderabad.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix. Queries on BIOINTMED
Appendix. Queries on BIOINTMED
We have executed several real-time queries on our system BIOINTMED and noted the response time for answering those queries. Results from some of the example queries are provided below.
-
1.
Question 1: “Find characteristics of the drug Lepirudin”Database query: DrugTable.find_by(name:‘Lepirudin’). pluck(:name,:description,:state, :group,:pharmacodynamics,:organism)Response time: 44.9 msTotal record found: 1〈 name: “Lepirudin”, description: “Lepirudin is identical to natural hirudin except f...”, state: “liquid”, group: “[‘approved’]”, pharmacodynamic: “Lepirudin is used to break up clots and to reduce ...”, organism: “[‘Humans and other mammals’]” 〉
-
2.
Question 2: “Find the genes related to the Alzheimer disease ”Database query: DiseaseTable.find_by(disease_name: ‘Alzheimer disease’).gene_tablesResponse time: 38.8 msTotal record found: 87A few typical records: 〈 GeneTable id: 2, full_name: “alpha-2-macroglobulin”, source: “MIM : 103950∣HGNC : HGNC : 7∣Ensembl : ENSG00000175899”, chromosome: “12”, locus: “12p13.31”, symbol: “A2M”, synonyms: “A2MD∣CPAMD5∣FWP007∣S863 − 7”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 37, full_name: “acetylcholinesterase (Cartwright blood group)”, source: “MIM : 100740∣HGNC : HGNC : 108∣Ensembl : ENSG00000087085”, chromosome: “7”, locus: “7q22.1”, symbol: “ACHE”, synonyms: “ACEE∣ARACHE∣N − ACHE∣Y T”, organism: “Homo sapiens”, gene_type:“protein-coding” 〉〈 GeneTable id: 222, full_name: “autocrine motility factor receptor”, source: “MIM : 603243∣HGNC : HGNC : 463∣Ensembl : ENSG00000159461”, chromosome: “16”, locus: “16q13”, symbol: “AMFR”, synonyms: “GP78∣RNF45”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 229, full_name: “bridging integrator 1”, source: “MIM : 601248∣HGNC : HGNC : 1052∣Ensembl : ENSG00000136717”, chromosome: “2”, locus: “2q14.3”, symbol: “BIN1”, synonyms: “AMPH2∣AMPHL∣CNM2∣SH3P9”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉
-
3.
Question 3: “Find the biomedical pathway associated with the drug Calcium”Database query: BiomedicalPathway.eager_load (:drug_tables).where(drug_tables:name: ‘calcium’)Response time: 124 msTotal record found: 364A few typical records: 〈 BiomedicalPathway id: 6, pathway_name: “Alpha Linolenic Acid and Linoleic Acid Metabolism”, pathway_label: “Metabolic”, description: “Linoleic acid is a member of essential fatty acids...”, other_id: “SMP0000018” 〉〈 BiomedicalPathway id: 11, pathway_name: “beta-Alanine Metabolism”, pathway_label: “Metabolic”, description: “beta-Alanine is formed by the proteolytic degradat...”, other_id: “SMP0000007” 〉〈 BiomedicalPathway id: 24, pathway_name: “Folate Metabolism”, pathway_label: “Metabolic”, description: “The 1-carbon transformations require folic acid (f...”, other_id: “SMP0000053” 〉〈 BiomedicalPathway id: 30, pathway_name: “Malate-Aspartate Shuttle”, pathway_label: “Metabolic”, description: “The malate-aspartate shuttle (also known as the ma...”, other_id:“SMP0000129” 〉
-
4.
Question 4: “Find all the drug related to the adverse event Nausea”Database query: DrugTable.joins(:adverse_events). where(adverse_events:adverse_event: ‘Nausea’).pluck (:name,:state,:group)Response time: 500.8 msTotal record found: 798A few typical records: 〈 “Lepirudin”, “liquid”, “[‘approved’]” 〉〈“Bivalirudin”, “solid”, “[‘approved’, ‘investigational’]” 〉〈 “Salmon Calcitonin”, “liquid”, “[‘approved’, ‘investigational’]” 〉〈 “Insulin Lispro”, “liquid”, “[‘approved’]” 〉〈 “Cetrorelix”, “solid”, “[‘approved’, ‘investigational’]” 〉
-
5.
Question 5: “Find the drug associated with an adverse event Rash based one particular physiological part like Sensory organs”Database query: DrugTable.joins(:adverse_events,: physiology_tables).where(adverse_events: adverse_event: ‘Rash’).where(physiology_tables:physiological_system: ‘Sensory organ’).pluck(:name)Response time: 58.8msTotal record found: 56A few typical records: 〈 “Azithromycin” 〉〈 “Moxifloxacin” 〉〈 “Alclometasone” 〉〈 “Sulfisoxazole” 〉〈 “Indomethacin” 〉〈 “Timolol” 〉
-
6.
Question 6: “Find all genes related to disease Fever”Database query: DiseaseTable.find by(disease name:‘Fever’).gene tablesResponse time: 248.1 msTotal record found: 26A few typical records: 〈 GeneTable id: 724, full_name: “cholecystokinin”, source: “MIM : 118440∣HGNC : HGNC : 1569∣Ensembl : ENSG00000187094”, chromosome: “3”, locus: “3p22.1”, symbol: “CCK”, synonyms: “-”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 1136, full_name: “corticotropin releasing hormone”, source: “MIM : 122560∣HGNC : HGNC : 2355∣Ensembl : ENSG00000147571”, chromosome: “8”, locus: “8q13.1”, symbol: “CRH”, synonyms: “CRF∣CRH1”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 1177, full_name: “colony stimulating factor 3”, source: “MIM : 138970∣HGNC : HGNC : 2438∣Ensembl : ENSG00000108342”, chromosome: “17”, locus: “17q21.1”, symbol: “CSF3”, synonyms: “C17orf33∣CSF3OS∣GCSF”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈GeneTable id: 2029, full_name: “follicle stimulating hormone subunit beta”, source: “MIM : 136530∣HGNC : HGNC : 3964∣Ensembl : ENSG00000131808”, chromosome: “11”, locus: “11p14.1”, symbol: “FSHB”, synonyms: “HH24”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 2590, full_name: “high mobility group box 1”, source: “MIM : 163905∣HGNC : HGNC : 4983∣Ensembl : ENSG00000189403”, chromosome: “13”, locus: “13q12.3”, symbol: “HMGB1”, synonyms: “HMG − 1∣HMG1∣HMG3∣SBP − 1”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉
-
7.
Question 7: “Find the molecular formula and isomeric smile for drug named Bivalirudin”Database query: DrugChemical.joins(:drug_tables).where (drug_tables:name: “Bivalirudin”).pluck(:molecular_formula,: isomeric_smiles).flattenResponse time: 22 msTotal record found: 1〈 “C98H138N24O33”,“CC[C@H](C)[C@@H](C(=O) N1CCC[C@H]1C(=N[C@@H](CCC(=O)O)C(=N[C@ @H](CCC(=O)O)C(=N[C@@H](CC2=CC=C(C=C2) O)C(=N[ C@@H ](CC(C)C)C(=O)O)O)O)O)O)N=C ([C@H] (CCC(=O)O)N=C([C@H](CCC(=O)O)N=C ((CC3=CC=CC=C3)N=C([C@H](CC(=O)O)N=C(CN =C([C@H](CC(=N) O)N=C(CN=C(CN=C(CN=C(CN =C([C@@H]4CCCN4C(=O)[C@H](CCCNC(=N)N)N =C([C@@H ]5CCCN5C(=O)[C@@H](CC6=CC=CC =C6) N)O)O)O)O)O)O)O)O)O)O)O)O” 〉
-
8.
Question 8: “Find all drug name containing amino acid in their subclass”Database query: DrugTable.joins(:drug_classification). where(“drug_classifications.d_subclass like ‘%Amino Acids%’”).pluck(:name).uniqResponse time: 11.2msTotal record found: 1813A few typical records: 〈 “Lepirudin” 〉〈 “Cetuximab” 〉〈 “Dornase alfa” 〉〈 “Denileukin diftitox” 〉〈 “Etanercept” 〉〈 “Leuprolide” 〉
Rights and permissions
About this article
Cite this article
Saha, A., Mukhopadhyay, J., Sarkar, S. et al. BIOINTMED: integrated biomedical knowledge base with ontologies and clinical trials. Med Biol Eng Comput 58, 2339–2354 (2020). https://doi.org/10.1007/s11517-020-02201-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-020-02201-0