Skip to main content

Advertisement

Log in

BIOINTMED: integrated biomedical knowledge base with ontologies and clinical trials

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Biomedical data are complex and heterogeneous. An ample reliable quantity of data is important for understanding and exploring the domain. The work aims to integrate biomedical data from various heterogeneous sources like dictionaries or corpus and amalgamate them into a uniform format for easier access by the end-user like biologist, pharmacist, and data scientist. The proposed integrated biomedical knowledge base, BIOINTMED, has 11,299, 12,981, 4428, 61,491, 48,663, and 13,146 unique entities for drugs, diseases, targets, genes, biomedical pathways, and adverse events, respectively. The uniform aggregated collection is also explored to study the interaction among these entity pairs. Finally, a complete statistical analysis of the consolidated biomedical entities is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Pinero J, Bravo A, Queralt-Rosinach N, Gutierrez-Sacristan A, Deu-Pons J, Centeno E, Garcia-Garcia J, Sanz F, Furlong LI (2016) DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res 45(D1):D833–D839

    Article  Google Scholar 

  2. Butler P, Evering W, Dybdal N, Donzanti B, Bergeron J, Casey M, Abernethy D, Minasian L, Ratcliffe S (2017) Predictox: a systems pharmacology project to examine cardiotoxicity associated with tyrosine kinase inhibitors, vol 88. https://doi.org/10.1016/j.vascn.2017.09.141, http://www.sciencedirect.com/science/article/pii/S1056871917303684

  3. Shadia Z, Sirarat S, Darrell RA (2017) Use of biomedical ontologies for integration of biological knowledge for learning and prediction of adverse drug reactions. Gene Regul Syst Biol 11:1177625017696075

    Google Scholar 

  4. Hameurlain A, Küng J, Wagner R, Böhm C, Eder J, Plant C (2011) Transactions on large-scale data- and knowledge-centered systems. In: Special issue on database systems for biomedical applications

  5. Federer C, Yoo M, Tan AC (2016) Big data mining and adverse event pattern analysis in clinical drug trials. ASSAY Drug Dev Technol 14(10):557–566

    Article  CAS  Google Scholar 

  6. Payne OPR (2012) Chapter 1: Biomedical knowledge integration. PLOS Comput Biol 8:12

    Article  Google Scholar 

  7. Bai H, Pan Y, Guo C, Zhao X, Shen B, Wang X, Liu Z, Cheng Y, Qin W, Qian X (2017) Synthesis of hydrazide-functionalized hydrophilic polymer hybrid graphene oxide for highly efficient n-glycopeptide enrichment and identification by mass spectrometry. Talanta 171:124–131. https://doi.org/10.1016/j.talanta.2017.04.076http://www.sciencedirect.com/science/article/pii/S0039914017305027

    Article  CAS  Google Scholar 

  8. Mate S, Kopcke F, Toddenroth D, Martin M, Prokosch HU, Burkle T, Ganslandt T (2015) Ontology-based data integration between clinical and research systems. PLoS One 10

  9. Sidhu SA, Kennedy PJ, Simeon J (2007) Knowledge discovery in biomedical data facilitated by domain ontologies. In: Knowledge discovery and data mining: challenges and realities, pp 189–201. https://doi.org/10.4018/978-1-59904-252-7.ch010

  10. Maree M, Belkhatir M (2015) Addressing semantic heterogeneity through multiple knowledge base assisted merging of domain-specific ontologies. Knowl Based Syst 73:199– 211. https://doi.org/10.1016/j.knosys.2014.10.001http://www.sciencedirect.com/science/article/pii/S0950705114003682

    Article  Google Scholar 

  11. Bousquet C, Souvignet J, Sadou E, Jaulent MC, Declerck G (2019) Ontological and non-ontological resources for associating medical dictionary for regulatory activities terms to snomed clinical terms with semantic properties. Front Pharmacol 10, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6747929/

  12. Fung KW, Bodenreider O (2019) Knowledge representation and ontologies. Springer International Publishing, Cham, pp 313–339

    Google Scholar 

  13. Guo AC, Gautam B, Knox C, Tzur D, Cheng D, Hassanali M, Shrivastava S, Wishart DS (2007) DrugBank: a knowledgebase for drugs, drug actions and drug targets, vol 36

  14. Wishart D, Knox C, Guo A, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34:D668–D672

    Article  CAS  Google Scholar 

  15. (2018). Drug Bank. https://www.drugbank.ca/, accessed: 2018-12-26

  16. Consortium TU (2014) UniProt: a hub for protein information. Nucleic Acids Res 43 (D1):D204–D212

    Article  Google Scholar 

  17. (2019). UniProt. ftp://ftp.uniprot.org/pub/databases/uniprot/, accessed: 2019-02-04

  18. Gindulyte A, Shoemaker BA, Yu B, Fu G, He J, Zhang J, Chen J, Wang J, Han L, Thiessen PA, He S, Bryant SH, Kim S, Bolton EE (2015) PubChem substance and compound databases, vol 44

  19. Li Q, Cheng T, Wang Y, Bryant SH (2010) Drug Discov Today 15(1):1052–1057

  20. (2019). PubChemPy. https://github.com/mcs07/PubChemPy, accessed: 2019-02-04

  21. (2019). PubChem. https://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi, accessed: 2019-02-04

  22. Kanehisa M, Goto S (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes, vol 28

  23. (2019). Kyoto encyclopedia of genes and genomes. FTP:// ftp.genome.jp/pub/kegg/medicus/, accessed: 2019-03-10

  24. (2019). KGML (KEGG markup language). https://www.kegg.jp/kegg/xml/, accessed: 2019-03-20

  25. Frolkis A, Guo AC, Gautam B, Knox C, Hau DD, Wishart DS, Lim E, Xia J, Liu P, Shrivastava S, Ly S, Jewison T, Law V, Liang Y (2009) SMPDB: the small molecule pathway database, vol 38

  26. (2019). Small Molecule Pathway Database. http://smpdb.ca/downloads, accessed: 2019-03-10

  27. Kuhn M, Letunic I, Jensen LJ, Bork P (2015) The SIDER database of drugs and side effects, vol 44

  28. (2019). STITCH. http://stitch.embl.de/cgi/download.pl, accessed: 2019-01-10

  29. (2019). SIDER. http://sideeffects.embl.de/, accessed: 2019-02-19

  30. (2019a). Offsides and twosides. http://tatonettilab.org/offsides/, accessed: 2019-03-20

  31. (2019b). OFFSIDES. http://lmmd.ecust.edu.cn/online_services/metaadedb/, accessed: 2019-03-10

  32. Hirsch J, Nicola G, McGinty G, Liu R, Barr R, Chittle M, Manchikanti L (2016) ICD-10: History and Context, vol 37

  33. Mattingly CJ, Colby GT, Forrest JN, Boyer JL (2003) The comparative Toxicogenomics Database (CTD), vol 111

  34. (2019). Comparative Toxicogenomics Database. http://ctdbase.org/downloads/, accessed: 2019-01-10

  35. Maglott D, Jim O, Pruitt KD, Tatusova T (2011) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 39:D52–D57

    Article  CAS  Google Scholar 

  36. (2019). NCBI Gene Databasem https://www.ncbi.nlm.nih.gov/home/genes/, accessed: 2019-01-10

  37. (2019). Clinical Trials, https://clinicaltrials.gov/, accessed: 2017-04-19

  38. (2019). PySpark, https://spark.apache.org/docs/2.3.1/api/python/pyspark.html, accessed: 2019-02-04

  39. (2019). Spark-XML, https://github.com/databricks/spark-xml, accessed: 2019-02-04

Download references

Funding

This work is financially supported by the project “Effective Drug Repurposing through literature and patent mining, data integration and development of systems pharmacology platform” sponsored by MHRD, India and Excelra Knowledge Solutions, Hyderabad.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankita Saha.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Queries on BIOINTMED

Appendix. Queries on BIOINTMED

We have executed several real-time queries on our system BIOINTMED and noted the response time for answering those queries. Results from some of the example queries are provided below.

  1. 1.

    Question 1: “Find characteristics of the drug Lepirudin”Database query: DrugTable.find_by(name:‘Lepirudin’). pluck(:name,:description,:state, :group,:pharmacodynamics,:organism)Response time: 44.9 msTotal record found: 1〈 name: “Lepirudin”, description: “Lepirudin is identical to natural hirudin except f...”, state: “liquid”, group: “[‘approved’]”, pharmacodynamic: “Lepirudin is used to break up clots and to reduce ...”, organism: “[‘Humans and other mammals’]” 〉

  2. 2.

    Question 2: “Find the genes related to the Alzheimer disease ”Database query: DiseaseTable.find_by(disease_name: ‘Alzheimer disease’).gene_tablesResponse time: 38.8 msTotal record found: 87A few typical records: 〈 GeneTable id: 2, full_name: “alpha-2-macroglobulin”, source: “MIM : 103950∣HGNC : HGNC : 7∣Ensembl : ENSG00000175899”, chromosome: “12”, locus: “12p13.31”, symbol: “A2M”, synonyms: “A2MDCPAMD5∣FWP007∣S863 − 7”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 37, full_name: “acetylcholinesterase (Cartwright blood group)”, source: “MIM : 100740∣HGNC : HGNC : 108∣Ensembl : ENSG00000087085”, chromosome: “7”, locus: “7q22.1”, symbol: “ACHE”, synonyms: “ACEEARACHENACHEY T”, organism: “Homo sapiens”, gene_type:“protein-coding” 〉〈 GeneTable id: 222, full_name: “autocrine motility factor receptor”, source: “MIM : 603243∣HGNC : HGNC : 463∣Ensembl : ENSG00000159461”, chromosome: “16”, locus: “16q13”, symbol: “AMFR”, synonyms: “GP78∣RNF45”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 229, full_name: “bridging integrator 1”, source: “MIM : 601248∣HGNC : HGNC : 1052∣Ensembl : ENSG00000136717”, chromosome: “2”, locus: “2q14.3”, symbol: “BIN1”, synonyms: “AMPH2∣AMPHLCNM2∣SH3P9”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉

  3. 3.

    Question 3: “Find the biomedical pathway associated with the drug Calcium”Database query: BiomedicalPathway.eager_load (:drug_tables).where(drug_tables:name: ‘calcium’)Response time: 124 msTotal record found: 364A few typical records: 〈 BiomedicalPathway id: 6, pathway_name: “Alpha Linolenic Acid and Linoleic Acid Metabolism”, pathway_label: “Metabolic”, description: “Linoleic acid is a member of essential fatty acids...”, other_id: “SMP0000018” 〉〈 BiomedicalPathway id: 11, pathway_name: “beta-Alanine Metabolism”, pathway_label: “Metabolic”, description: “beta-Alanine is formed by the proteolytic degradat...”, other_id: “SMP0000007” 〉〈 BiomedicalPathway id: 24, pathway_name: “Folate Metabolism”, pathway_label: “Metabolic”, description: “The 1-carbon transformations require folic acid (f...”, other_id: “SMP0000053” 〉〈 BiomedicalPathway id: 30, pathway_name: “Malate-Aspartate Shuttle”, pathway_label: “Metabolic”, description: “The malate-aspartate shuttle (also known as the ma...”, other_id:“SMP0000129” 〉

  4. 4.

    Question 4: “Find all the drug related to the adverse event Nausea”Database query: DrugTable.joins(:adverse_events). where(adverse_events:adverse_event: ‘Nausea’).pluck (:name,:state,:group)Response time: 500.8 msTotal record found: 798A few typical records: 〈 “Lepirudin”, “liquid”, “[‘approved’]” 〉〈“Bivalirudin”, “solid”, “[‘approved’, ‘investigational’]” 〉〈 “Salmon Calcitonin”, “liquid”, “[‘approved’, ‘investigational’]” 〉〈 “Insulin Lispro”, “liquid”, “[‘approved’]” 〉〈 “Cetrorelix”, “solid”, “[‘approved’, ‘investigational’]” 〉

  5. 5.

    Question 5: “Find the drug associated with an adverse event Rash based one particular physiological part like Sensory organs”Database query: DrugTable.joins(:adverse_events,: physiology_tables).where(adverse_events: adverse_event: ‘Rash’).where(physiology_tables:physiological_system: ‘Sensory organ’).pluck(:name)Response time: 58.8msTotal record found: 56A few typical records: 〈 “Azithromycin” 〉〈 “Moxifloxacin” 〉〈 “Alclometasone” 〉〈 “Sulfisoxazole” 〉〈 “Indomethacin” 〉〈 “Timolol” 〉

  6. 6.

    Question 6: “Find all genes related to disease Fever”Database query: DiseaseTable.find by(disease name:‘Fever’).gene tablesResponse time: 248.1 msTotal record found: 26A few typical records: 〈 GeneTable id: 724, full_name: “cholecystokinin”, source: “MIM : 118440∣HGNC : HGNC : 1569∣Ensembl : ENSG00000187094”, chromosome: “3”, locus: “3p22.1”, symbol: “CCK”, synonyms: “-”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 1136, full_name: “corticotropin releasing hormone”, source: “MIM : 122560∣HGNC : HGNC : 2355∣Ensembl : ENSG00000147571”, chromosome: “8”, locus: “8q13.1”, symbol: “CRH”, synonyms: “CRFCRH1”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 1177, full_name: “colony stimulating factor 3”, source: “MIM : 138970∣HGNC : HGNC : 2438∣Ensembl : ENSG00000108342”, chromosome: “17”, locus: “17q21.1”, symbol: “CSF3”, synonyms: “C17orf33∣CSF3OSGCSF”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈GeneTable id: 2029, full_name: “follicle stimulating hormone subunit beta”, source: “MIM : 136530∣HGNC : HGNC : 3964∣Ensembl : ENSG00000131808”, chromosome: “11”, locus: “11p14.1”, symbol: “FSHB”, synonyms: “HH24”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉〈 GeneTable id: 2590, full_name: “high mobility group box 1”, source: “MIM : 163905∣HGNC : HGNC : 4983∣Ensembl : ENSG00000189403”, chromosome: “13”, locus: “13q12.3”, symbol: “HMGB1”, synonyms: “HMG − 1∣HMG1∣HMG3∣SBP − 1”, organism: “Homo sapiens”, gene_type: “protein-coding” 〉

  7. 7.

    Question 7: “Find the molecular formula and isomeric smile for drug named Bivalirudin”Database query: DrugChemical.joins(:drug_tables).where (drug_tables:name: “Bivalirudin”).pluck(:molecular_formula,: isomeric_smiles).flattenResponse time: 22 msTotal record found: 1〈 “C98H138N24O33”,“CC[C@H](C)[C@@H](C(=O) N1CCC[C@H]1C(=N[C@@H](CCC(=O)O)C(=N[C@ @H](CCC(=O)O)C(=N[C@@H](CC2=CC=C(C=C2) O)C(=N[ C@@H ](CC(C)C)C(=O)O)O)O)O)O)N=C ([C@H] (CCC(=O)O)N=C([C@H](CCC(=O)O)N=C ((CC3=CC=CC=C3)N=C([C@H](CC(=O)O)N=C(CN =C([C@H](CC(=N) O)N=C(CN=C(CN=C(CN=C(CN =C([C@@H]4CCCN4C(=O)[C@H](CCCNC(=N)N)N =C([C@@H ]5CCCN5C(=O)[C@@H](CC6=CC=CC =C6) N)O)O)O)O)O)O)O)O)O)O)O)O” 〉

  8. 8.

    Question 8: “Find all drug name containing amino acid in their subclass”Database query: DrugTable.joins(:drug_classification). where(“drug_classifications.d_subclass like ‘%Amino Acids%’”).pluck(:name).uniqResponse time: 11.2msTotal record found: 1813A few typical records: 〈 “Lepirudin” 〉〈 “Cetuximab” 〉〈 “Dornase alfa” 〉〈 “Denileukin diftitox” 〉〈 “Etanercept” 〉〈 “Leuprolide” 〉

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, A., Mukhopadhyay, J., Sarkar, S. et al. BIOINTMED: integrated biomedical knowledge base with ontologies and clinical trials. Med Biol Eng Comput 58, 2339–2354 (2020). https://doi.org/10.1007/s11517-020-02201-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-020-02201-0

Keywords

Navigation