A method and software framework for enriching private biomedical sources with data from public online repositories

https://doi.org/10.1016/j.jbi.2016.02.004Get rights and content
Under an Elsevier user license
open archive

Highlights

  • A method for enriching private sources with data from public databases is proposed.

  • Related public data is automatically identified and linked to existing private database records.

  • We tested our approach by linking a private Wilm’s tumor data set to NCBI records.

  • The integrated data can be browsed via SPARQL queries over a common schema (HDOT).

  • Our tool can be used to import public data from other public resources and tools.

Abstract

Modern biomedical research relies on the semantic integration of heterogeneous data sources to find data correlations. Researchers access multiple datasets of disparate origin, and identify elements—e.g. genes, compounds, pathways—that lead to interesting correlations. Normally, they must refer to additional public databases in order to enrich the information about the identified entities—e.g. scientific literature, published clinical trial results, etc. While semantic integration techniques have traditionally focused on providing homogeneous access to private datasets—thus helping automate the first part of the research, and there exist different solutions for browsing public data, there is still a need for tools that facilitate merging public repositories with private datasets. This paper presents a framework that automatically locates public data of interest to the researcher and semantically integrates it with existing private datasets. The framework has been designed as an extension of traditional data integration systems, and has been validated with an existing data integration platform from a European research project by integrating a private biological dataset with data from the National Center for Biotechnology Information (NCBI).

Keywords

Semantic integration
RDF
Public databases

Cited by (0)