ABSTRACT
Since Swanson proposed the Undiscovered Public Knowledge (UPK) model, there have been many approaches to uncover UPK by mining the biomedical literature. These earlier works, however, required substantial manual intervention to reduce the number of possible connections and are mainly applied to disease-effect relation. With the advancement in biomedical science, it has become imperative to extract and combine information from multiple disjoint researches, studies and articles to infer new hypothesesand expand knowledge. In this paper, we propose MKEM, a Multi-level Knowledge Emergence Model, to discover implicit relationships using Natural Language Processing techniques such as Link Grammar and Ontologies such as Unified Medical Language System (UMLS) MetaMap. The contribution of MKEM is as follows: First, we propose a flexible knowledge emergence model to extract implicit relationships across different levels such as molecular level for gene and protein and Phenomic level for disease and treatment. Second, we employ MetaMap for tagging biological concepts. Third, we provide an empirical and systematic approach to discover novel relationships. Our experiments show that MKEM is a powerful tool to discover hidden relationships residing in extracted entities that were represented by our Substance-Effect-Process-Disease-Body Part (SEPDB) model.
- Swanson, DR. 1986. Fish-oil, Raynaud's Syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1):7--18.Google ScholarCross Ref
- Swanson, DR. 1986. Undiscovered public knowledge. Library Quarterly. 56(2):103--118.Google ScholarCross Ref
- Swanson, DR. 2001. On the fragmentation of knowledge, the connection explosion and assembling other people's ideas. Bull. Amer. Soc. Inf. Sci. Technol. (Feb 2001) Vol. 27, no 3, pp. 12--14.Google Scholar
- Lindsay, R. K, and Gordon, M. D. 1999. Literature-based discovery by lexical statistics. Journal of the American Society for Information Science, 50(7):574--587, 1999. Google ScholarDigital Library
- Pratt, Wanda and Yetisgen-Yildiz, Meliha. 2003. LitLinker: capturing connections across the biomedical literature. K-CAP'03 (Oct. 23-25, 2003) pp. 105--112, Sanibel Island, FL. Google ScholarDigital Library
- Srinivasan, P., 2004. Text mining: Generating hypotheses from MEDLINE. Journal of the American Society for Information Science, 2004, Vol. 55, No. 4, pp. 396--413 Google ScholarDigital Library
- Weeber, M., Vos, R., Klein, H., de Jong-Van den Berg, L.T.W., Aronson, A&Molema, G. 2003. Generating hypotheses by discovering implicit associations in the literature: A case report for new potential therapeutic uses for Thalidomide. Journal of the American Medical Informatics Association, 10(3):252--259, 2003.Google ScholarCross Ref
- Hristovski D, Stare J, Peterlin B, and Dzeroski S. 2001. Supporting discovery in medicine by association rule mining in Medline and UMLS. Medinfo. 2001, 10(Pt 2), 1344--8.Google Scholar
- Atkinson R and Rivas A. 2008. Discovering Novel Causal Patterns from Biomedical Natural-Language Texts using Bayesian Nets. IEEE Transactions on Information technology in Biomedicine (November 2008), Vol 12, No. 6. Google ScholarDigital Library
- Srinivasan, P. 2004. Text mining: Generating hypotheses from MEDLINE, Journal of the American Society for Information Science, 2004, Vol. 55, No. 4, pp. 396--413 Google ScholarDigital Library
- Agrawal, R., et al. 1995. Fast Discovery of Association Rules, Advances in Knowledge Discovery and Data Mining, U. Fayyad, et al., Editors. 1995, AAAI/MIT Press. Google ScholarDigital Library
- Gordon, MD, Dumais, S. 1998. Using latent semantic indexing for literature based discovery. Journal of the American Society for Information Science, 49 : 674--685. Google ScholarDigital Library
- Gordon, MD, Lindsay, RK. 1996. Toward discovery support systems: a replication, re-examination and extension of Swanson's work on literature-based discovery of a connection between Raynaud's and fish oil. Journal of the American Society for Information Science, 47 : 116--128. Google ScholarDigital Library
- Lindsay, RK, Gordon, MD. 1999 Literature-based discovery by lexical statistics. Journal of the American Society for Information Science, 50: 574--587. Google ScholarDigital Library
- Small Molecule Subgraph Detector --http://www.ebi.ac.uk/thorntonsrv/software/SMSD/Google Scholar
- Hu X., Yoo I., Song M., Zhang Y., Song I-Y. 2005. Mining Undiscovered Public Knowledge from Complementary and Non-interactive Biomedical Literature through Semantic Pruning. in ACM Fourteen Conference on Information and Knowledge Management (ACM CIKM 2005). Google ScholarDigital Library
- Dae-Hee Lee, Juong G Rhee and Yong J Lee. 2009. Reactive oxygen species up-regulate p53 and Puma; a possible mechanism for apoptosis during combined treatment with TRAIL and wogonin. British Journal of Pharmacology (2009)Google Scholar
- Mao Li, Zhuo Zhang, Donald L. Hill, Xinbin Chen, Hui Wang, and Ruiwen Zhang. 2005. Genistein, a Dietary Isoflavone, Down-Regulates the MDM2 Oncogene at Both Transcriptional and Posttranslational Levels. Cancer Res. 2005 65: (18).Google Scholar
- Srinivasan, Padmini, and Thomas C. Rindflesch. 2002. Exploring text mining from MEDLINE. Proceedings of the 2002 AMIA Annual Symposium, 722--6.Google Scholar
- Rindflesch TC, Fiszman M. 2003. The Interaction of Domain Knowledge and Linguistic Structure in Natural Language Processing: Interpreting Hypernymic Propositions. In Biomedical Text Journal of Biomedical Informatics. 2003;36(6):462--77. Google ScholarDigital Library
- Fiszman Marcelo, Thomas C. Rindflesch, Halil Kilicoglu. 2003. Integrating a hypernymic proposition interpreter into a semantic processor for biomedical text. Proceedings of the 2003 AMIA Annual Symposium.Google Scholar
Index Terms
- MKEM: a multi-level knowledge emergence model for mining undiscovered public knowledge
Recommendations
Mining undiscovered public knowledge from complementary and non-interactive biomedical literature through semantic pruning
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementTwo complementary and non-interactive literature sets of articles, when they are considered together, can reveal useful information of scientific interest not apparent in either of the two document sets. Swanson called the existence of such knowledge, ...
ModEx: A text mining system for extracting mode of regulation of transcription factor-gene regulatory interaction
Graphical abstractDisplay Omitted
Highlights- ModEx is a text mining approach to identify mode of regulation between TF and genes.
Abstract BackgroundTranscription factors (TFs) are proteins that are fundamental to transcription and regulation of gene expression. Each TF may regulate multiple genes and each gene may be regulated by multiple TFs. TFs can act as ...
Enhancing medical named entity recognition with an extended segment representation technique
ObjectiveThe objective of this paper is to formulate an extended segment representation (SR) technique to enhance named entity recognition (NER) in medical applications. MethodsAn extension to the IOBES (Inside/Outside/Begin/End/Single) SR technique is ...
Comments