Skip to main content

Semantic Data Integration for Life Science Entities

  • Reference work entry
Encyclopedia of Database Systems
  • 93 Accesses

Synonyms

Object identification; Data fusion; Duplicate detection;LSID

Definition

An entity is the representation of a (not necessarily physical) real-world object, such as a gene, a protein, or a disease, within a database. To integrate information about the same entities from different databases, these representations must be analyzed to uncover the corresponding underlying objects. This process is called entity identification. A variation of entity identification is duplicate detection, which analyses two or more entities to determine whether they represent the same real-world object or not. Finally, data fusion is the process of generating a single, homogeneous representation from multiple, possibly inconsistent entities that represent the same real-world object.

When entities have globally unique keys, such as ISBN numbers in the case of books, entity identification and duplicate detection are simple. However, in life science databases, one usually has only descriptive...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 2,500.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Bhat T.N., Bourne P., Feng Z., Gilliland G., Jain S., Ravichandran V., Schneider B., Schneider K., Thanki N., and Weissig H, et al. The PDB data uniformity project. Nucleic Acids Res., 29(1):214–218, 2001.

    Google Scholar 

  2. Brenner S.E. Errors in Genome Annotation. Trends Genet., 15(4):132–133, 1999.

    Google Scholar 

  3. Gibson G. and Muse S.V. A Primer of Genome Science. Sinauer Associates, Sunderland, MA.(2001)

    Google Scholar 

  4. Karp P.D. Models of identifiers. In Proc. Second Meeting on Interconnection of Molecular Biology Databases. Cambridge, UK, 1995.

    Google Scholar 

  5. Kingsbury D. Consensus, common entry, and community curation. Nat. Biotechnol., 14:679, 1996.

    Google Scholar 

  6. Krauthammer M. and Nenadic G. Term identification in the biomedical literature. J. Biomed. Inform., 37(6):512–526, 2004.

    Google Scholar 

  7. Leser U. and Hakenberg J. What Makes a Gene Name? Named Entity Recognition in the Biomedical Literature. Briefings in Bioinformatics, 6(4):357–369, 2005.

    Google Scholar 

  8. Müller H., Naumann F., and Freytag J.-C. Data quality in genome databases. In Proc. Conf. on Information Quality, 2003.

    Google Scholar 

  9. Smith T.F. and Waterman M.S. Identification of common molecular subsequences. J. Mol. Biol., 147:195–197, 1981.

    Google Scholar 

  10. Tamames J. and Valencia A. 2006.The success (or not) of HUGO nomenclature. Genome Biol., 7(5):402,

    Google Scholar 

  11. Trissl S., Rother K., Müller H., Koch I., Steinke T., Preissner R., Frömmel C., and Leser U. Columba: an integrated database of proteins, structures, and annotations. BMC Bioinformatics, 6:81, 2005.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this entry

Cite this entry

Leser, U. (2009). Semantic Data Integration for Life Science Entities. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_627

Download citation

Publish with us

Policies and ethics