Abstract
This paper presents the LSLink (or Life Science Link) methodology that provides users with a set of tools to explore the rich Web of interconnected and annotated objects in multiple repositories, and to identify meaningful associations. Consider a physical link between objects in two repositories, where each of the objects is annotated with controlled vocabulary (CV) terms from two ontologies. Using a set of LSLink instances generated from a background dataset of knowledge we identify associations between pairs of CV terms that are potentially significant and may lead to new knowledge. We develop an approach based on the logarithm of the odds (LOD) to determine a confidence and support in the associations between pairs of CV terms. Using a case study of Entrez Gene objects annotated with GO terms linked to PubMed objects annotated with MeSH terms, we describe a user validation and analysis task to explore potentially significant associations.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., et al.: Mining association rules between sets of items in large databases. SIGMOD Record 22(2), 207–216 (1993)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceeding of the 20th International Conference on Very Large Data Bases, pp. 487–499, San Francisco, CA, USA (September 1994)
Barnard, G.A.: Statistical inference. Journal of the Royal Statistical Society. Series B (Methodological) 11(2), 115–149 (1949)
Blaschke, C., et al.: Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 6(Suppl 1), S16 (2005)
Camon, E., et al.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in UniProt with Gene Ontology. Nucleic Acids Research 32(Database issue), D262–D266 (2004)
Couto, F.M., et al.: Finding genomic ontology terms in text using evidence content. BMC Bioinformatics 6(Suppl 1), S21 (2005)
Couto, F.M., et al.: GOAnnotator: linking protein GO annotations to evidence text. Journal of Biomedical Discovery and Collaboration 1(19) (December 20, 2006)
Current Semantic Types in the Unified Medical Language System (UMLS), http://www.nlm.nih.gov/research/umls/META3_current_semantic_types.ht ml
Fiszman, M., et al.: Integrating a hypernymic proposition interpreter into a semantic processor for biomedical text. In: AMIA 2003 Annual Symposium, pp. 239–243, Washington, DC, USA (November 8-12, 2003)
Fujibuchi, W., et al.: DBGET/LinkDB: an integrated database retrieval system. In: Third Pacific Symposium on Biocomputing (PSB 1998), pp. 683–694, Maui, Hawaii, USA, (January 4-9, 1998)
Gene Ontology (GO), http://www.geneontology.org/
Gene Ontology Annotation (GOA), http://www.ebi.ac.uk/GOA/
Hamosh, A., et al.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 33(Database issue), D514–D517 (2005)
Hoffmann, R., Valencia, A.: A gene network for navigating the literature. Nature Genetics 36(7), 664 (2004)
Hristovski, D., et al.: Improving literature based discovery support by genetic knowledge integration. Studies in health technology and informatics 95, 68–73 (2003)
Kersey, P.J., et al.: Integr8: enhanced inter-operability of european molecular biology databases. Methods of Information in Medicine 42(2), 154–160 (2003)
Koike, A., Takagi, T.: Knowledge discovery based on an implicit and explicit conceptual network. Journal of the American Society for Information Science and Technology 58(1), 51–65 (2007)
Korbel, J.O., et al.: Systematic association of genes to phenotypes by genome and literature mining. PLoS Biology 3(5) (April 5, 2005)
Lee, W.-J., Raschid, L., Vidal, M.-E.: A Generic, Flexible and Scalable Methodology to Enhance the Semantics of Links in Life Science Data Resources. Technical Report CS-TR-4809 (UMIACS-TR-2006-29), Univeristy of Maryland, (June 2006)
Maglott, D., et al.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35(Database issue), D26–D31 (2007)
Martin, A.C.: PDBSprotEC: a Web-accessible database linking PDB chains to EC numbers via SwissProt. Bioinformatics 20(6), 986–988 (2004)
Medical Subject Headings (MeSH), http://www.nlm.nih.gov/mesh/meshhome.html
Neumann, E.K., Quan, D.: Biodash: A semantic web dashboard for drug development. In: Eleventh Pacific Symposium on Biocomputing (PSB 2006), pp. 140–151, Maui, Hawaii, USA, (January 3-7, 2006)
Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Association of genes to genetically inherited diseases using data mining. Nature Genetics 31(3), 316–319 (2002)
Ray, S., Craven, M.: Learning statistical models for annotating proteins with function information using biomedical text. BMC Bioinformatics 6(Suppl 1), S18 (2005)
Rice, S.B., Nenadic, G., Stapley, B.J.: Mining protein function from text using term-based support vector machines. BMC Bioinformatics 6(Suppl 1), S22 (2005)
Siadaty, M.S., Knausg, W.A.: Locating previously unknown patterns in data-mining results: a dual data- and knowledge- mining method. BMC Medical Informatics and Decision Making, 6(13) (March 7, 2006)
Srinivasan, P., Libbus, B.: Mining MEDLINE for implicit links between dietary substances and diseases. Bioinformatics 20(Supplement 1), i290–i296 (2004)
Stanyon, C.A., et al.: A Drosophila protein-interaction map centered on cell-cycle regulators. Genome Biology 5(12), R96 (2004)
Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), http://www.snomed.org/snomedct/
Thomas, J., et al.: Automatic extraction of protein interactions from scientific abstracts. In: Fifth Pacific Symposium on Biocomputing (PSB 2000), pp. 538–549. Oahu, Hawaii, USA (2000)
Thorn, C.F., et al.: PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Methods in Molecular Biology 311, 179–191 (2005)
Tiffin, N., et al.: Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Research 33(5), 1544–1552 (2005)
Wheeler, D.L., et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 35(Database issue), D5–D12 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Lee, WJ., Raschid, L., Srinivasan, P., Shah, N., Rubin, D., Noy, N. (2007). Using Annotations from Controlled Vocabularies to Find Meaningful Associations. In: Cohen-Boulakia, S., Tannen, V. (eds) Data Integration in the Life Sciences. DILS 2007. Lecture Notes in Computer Science(), vol 4544. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73255-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-73255-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73254-9
Online ISBN: 978-3-540-73255-6
eBook Packages: Computer ScienceComputer Science (R0)