ABSTRACT
This paper introduces DIDO, a system providing convenient access to knowledge about factors involved in human diseases, automatically extracted from textual Web sources. The knowledge base is bootstrapped by integrating entities from hand-crafted sources like MeSH and OMIM. As these are short on relationships between dierent types of biomedical entities, DIDO employs flexible and robust pattern learning and constraint-based reasoning methods to automatically extract new relational facts from textual sources. These facts can then be iteratively added to the knowledge base. The result is a semantic graph of typed entities and relations between diseases, their symptoms, and their factors, with emphasis on environmental factors but covering also molecular determinants. We demonstrate the value of DIDO for knowledge discovery about causal factors and properties of complex diseases, including factor-disease chains.
- GO: The gene ontology. http://www.geneontology.org/.Google Scholar
- KEGG:. http://www.genome.jp/kegg/.Google Scholar
- MeSH: Medical Sub ject Headings. http://www.nlm.nih.gov/mesh/.Google Scholar
- MIPS: The Mammalian Protein-Protein Interaction Database. http://www.test.org/doe/.Google Scholar
- OMIM: Online Mendelian Inheritance in Man. http://www.ncbi.nlm.nih.gov/omim/.Google Scholar
- Stanford Log-linear Part-Of-Speech Tagger. http://nlp.stanford.edu/software/tagger.shtml.Google Scholar
- UMLS: Unified Medical Language System. http://www.nlm.nih.gov/research/umls/.Google Scholar
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: a nucleus for a web of open data. In Proc. of ISWC '07, 2007. Google ScholarDigital Library
- A. Doan and et al. (Eds.). Special Issue on Information Extraction. ACM SIGMOD Record, 37(4), 2008. Google ScholarDigital Library
- J.-H. Kim, A. Mitchell, T. K. Attwood, and M. Hilario. Learning to extract relations for protein annotation. Bioinformatics, 23(13):i256--63, 2007. Google ScholarDigital Library
- Y. I. Liu, P. H. Wise, and A. J. Butte. The "etiome": identification and clustering of human disease etiological factors. BMC Bioinf., 10(S2):S14, 2009.Google Scholar
- N. Nakashole, M. Theobald, and G. Weikum. Find your Advisor: Robust Knowledge Gathering from the Web. In Proc. of WebDB '10. Google ScholarDigital Library
- F. M. Suchanek, G. Kasneci, and G. Weikum. YAGO: A Core of Semantic Knowledge. In Proc. of WWW '07. Google ScholarDigital Library
- F. M. Suchanek, M. Sozio, and G. Weikum. SOFIE: A Self-Organizing Framework for Information Extraction. In Proc. of WWW '09. Google ScholarDigital Library
- L. Tari, S. Anwar, S. Liang, J. Cai, and C. Baral. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics, 26(18):i547--53, 2010. Google ScholarDigital Library
- G. Weikum and M. Theobald. From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. In Proc. of PODS '10, 2010. Google ScholarDigital Library
Index Terms
- DIDO: a disease-determinants ontology from web sources
Recommendations
A Flexible Text Mining System for Entity and Relation Extraction in PubMed
DTMBIO '15: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical InformaticsDue to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means ...
Application of Domain Ontologies to Natural Language Processing: A Case Study for Drug-Drug Interactions
Natural Language Processing NLP techniques can provide an interesting way to mine the growing biomedical literature, and a promising approach for new knowledge discovery. However, the major bottleneck in this area is that these systems rely on specific ...
Two evaluations on Ontology-style relation annotations
AbstractIn this paper, we propose an Ontology-Style Relation (OSR) annotation approach. In conventional Relation Extraction (RE) datasets, relations are annotated as a link between two entity mentions. In contrast, in our OSR annotation, a relation is ...
Highlights- The relation annotations can be easily converted to Resource Description Framework (RDF) triples to populate an Ontology.
- Some part of conventional RE tasks can be tackled as Named Entity Recognition (NER) tasks. The relation classes ...
Comments