Skip to main content

Linking Biological Databases Semantically for Knowledge Discovery

  • Conference paper
Book cover Advances in Conceptual Modeling – Challenges and Opportunities (ER 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5232))

Included in the following conference series:

  • 1344 Accesses

Abstract

Many important life sciences questions are aimed at studying the relationships and interactions between biological functions/processes and biological entities such as genes. The answers may be found by examining diverse types of biological/genomic databases. Finding these answers, however, requires accessing, and retrieving data, from diverse biological data sources. More importantly, sophisticated knowledge discovery processes involve traversing through large numbers of inherent links among various data sources. Currently, the links among data are either implemented as hyperlinks without explicitly indicating their meanings and labels, or hidden in a seemingly simple text format. Consequently, biologists spend numerous hours identifying potentially useful links and following each lead manually, which is time-consuming and error-prone. Our research is aimed at constructing semantic relationships among all biological entities. We have designed a semantic model to categorize and formally define the links. By incorporating ontologies such as Gene or Sequence ontology, we propose techniques to analyze the links embedded within and among data records, to explicitly label their semantics, and to facilitate link traversal, querying, and data sharing. Users may then ask complicated and ad hoc questions and even design their own workflow to support their knowledge discovery processes. In addition, we have performed an empirical analysis to demonstrate that our method can not only improve the efficiency of querying multiple databases, but also yield more useful information.

This research is supported in part by research grants #EF0735191 and #IIS0455993 from the National Science Foundation, USA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. NCBI, http://www.ncbi.nlm.nih.gov/

  2. DDBJ, http://www.ddbj.nig.ac.jp/

  3. EBI, http://www.ebi.ac.uk/

  4. Entrez, http://www.ncbi.nlm.nih.gov/sites/gquery

  5. Entrez Utility, http://eutils.ncbi.nlm.nih.gov/entrez/eutils/

  6. UMLS, http://www.nlm.nih.gov/research/umls/

  7. Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)

    Article  Google Scholar 

  8. Alfarano, C., et al.: The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 33(Database issue), 418–24 (2005)

    Google Scholar 

  9. Bornberg-Bauer, E., Paton, N.W.: Conceptual Data Modelling for Bioinformatics. Briefings In Bioinformatics 3(2), 166–180 (2002)

    Article  Google Scholar 

  10. Chen, P.P.: The entity-relationship model—toward a unified view of data. ACM Transactions on Database Systems (TODS) 1, 9–36 (1976)

    Article  Google Scholar 

  11. Cornell, M., Paton, N.W., Wu, S., Goble, C.A., Miller, C.J., Kirby, P.: GIMS – A Data Warehouse for Storage and Analysis of Genome Sequence and Functional Data. In: IEEE International Conference on Bioinformatics and Biomedical Engineering, pp. 15–22 (2001)

    Google Scholar 

  12. Eilbeck, K., Lewis, S., Mungall, C.J., Yandell, M., Stein, L., Durbin, R., Ashburner, M.: The Sequence Ontology: A tool for the unification of genome annotations. Genome Biology 6(1.5), Article R44 (2005)

    Article  Google Scholar 

  13. Heymann, S., Naumann, F., Raschid, L., Rieger, P.: Labeling and Enhancing Life Sciences Links. In: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB 2004), pp. 569–570 (2004)

    Google Scholar 

  14. Helden, J.V., et al.: Representing and Analysing Molecular and Cellular Function in the Computer. Biol. Chem. 381, 921–935 (2000)

    Article  Google Scholar 

  15. Lacroix, Z., Raschid, L., Vidal, M.: Semantic Model to Integrate Biological Resources. In: ICDE Workshops 2006, p. 63 (2006)

    Google Scholar 

  16. Lash, A., Lee, W.J., Raschid, L.: A methodology to enhance the semantics of links between PubMed publications and markers in the human genome. Bioinformatics and Bioengineering, 185–192 (2005)

    Google Scholar 

  17. Lee, W.J., Raschid, L., Srinivasan, P., Shah, N., Rubin, D., Noy, N.: Using annotations from controlled vocabularies to find meaning associations. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 247–263. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Paton, N.W., Khan, S.A., Hayes, A., Moussouni, F., Brass, A., Eilbeck, K., Goble, C.A., Hubbard, S.J., Oliver, S.G.: Conceptual Modeling of Genomic Information. Bioinformatics 16(6), 548–557 (2000)

    Article  Google Scholar 

  19. Ram, S., Wei, W.: Modeling the Semantics of 3D Protein Structures. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 696–708. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Ram, S., Wei, W.: Understanding Semantic Links among Heterogeneous Biological Data Sources. In: Proceedings of the 15th Workshop on Information Technology and Systems (WITS) , Las Vegas, Nevada, December 9-10 (2005)

    Google Scholar 

  21. Ram, S., Wei, W.: Semantic Modeling of Biological Sequences. In: Proceedings of the 13th Workshop on Information Technologies and Systems, Seattle, December 12-13, pp. 183–188 (2003)

    Google Scholar 

  22. Ram, S., Wei, W.: BIO-LINK: Using Semantics to Link Heterogeneous Biological Databases for Knowledge Discovery. In: Proceedings of 16th Workshop in Information Technology and Systems, Milwaukee, December (2006)

    Google Scholar 

  23. Taylor, C.F., Paton, N.W., et al.: A Systematic Approach to Modeling, Capturing, and Disseminating Proteomics Experimental Data. Nature Biotech. 21, 247–254 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ram, S., Zhang, K., Wei, W. (2008). Linking Biological Databases Semantically for Knowledge Discovery. In: Song, IY., et al. Advances in Conceptual Modeling – Challenges and Opportunities. ER 2008. Lecture Notes in Computer Science, vol 5232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87991-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87991-6_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87990-9

  • Online ISBN: 978-3-540-87991-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics