Detecting Invalid Dictionary Entries for Biomedical Text Mining

Takeuchi, Hironori; Yoshida, Issei; Ikawa, Yohei; Iida, Kazuo; Fukui, Yoko

doi:10.1007/11683568_10

Detecting Invalid Dictionary Entries for Biomedical Text Mining

Hironori Takeuchi²⁴,
Issei Yoshida²⁴,
Yohei Ikawa²⁴,
Kazuo Iida²⁵ &
…
Yoko Fukui²⁵

Conference paper

467 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3886))

Abstract

In text mining, to calculate precise keyword frequency distributions in a particular document collection, we need to map different keywords that denote the same entity to a canonical form. In the life science domain, we can construct a large dictionary that contains the canonical forms and their variants based on the information from external resources and use this dictionary for the term aggregation. However, in this automatically generated dictionary, there are many invalid entries that have negative effects on the calculations of keyword frequencies. In this paper, we propose and test methods to detect invalid entries in the dictionary.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., et al.: SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365–370 (2003)
Article Google Scholar
Humphrey, B.L., Schoolman, H.M.: The Unified Medical Language System: An Informatics Research Collaboration. Journal of the American Medical Informatics Association 5(1), 1–11 (1998)
Article Google Scholar
Koike, A., Takagi, T.: Gene/ Protein/ Family Name Recognition in Biomedical Literature. In: HLT-NAACL 2004 Workshop: BioLink 2004, Linking Biological Literature, Ontologies and Databases, pp. 9–16 (2004)
Google Scholar
Krauthammer, M., Nenadic, G.: Term Identification in the Biomedical Literature. Journal of Biomedical Informatics 37(6), 512–526 (2004)
Article Google Scholar
Liu, H., Hu, Z., Zhang, J., Wu, C.: BioThesaurus: A Web-Based Thesaurus of Protein and Gene Names. Bioinformatics 22(1), 103–105 (2006)
Article Google Scholar
Nasukawa, T., Nagano, T.: Text analysis and knowledge mining system. IBM System Journal 40(4), 967–984 (2001)
Article Google Scholar
Pruitt, K.D., Maglott, D.R.: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Research 29(1), 137–140 (2001)
Article Google Scholar
Schwartz, A.S., Hearst, M.A.: A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text. In: Proceeding of the Pac. Symp. Biocomput., pp. 451–462 (2003)
Google Scholar
Shatkay, H., Feldman, R.: Mining the Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology 10(6), 821–855 (2003)
Article Google Scholar
Tsuruoka, Y., Tsujii, J.: Probabilistic Term Variant Generator for Biomedical Terms. In: Proceeding of the SIGIR 2003, pp. 167–173 (2003)
Google Scholar
Tuason, O., Chen, L., Liu, H., Blake, J.A., Friedman, C.: Biological nomenclatures: a source of lexical knowledge and ambiguity. In: Proceeding of the Pac. Symp. Biocomput., pp. 238–249 (2004)
Google Scholar
Uramoto, N., Matsuzawa, H., Nagano, T., Murakami, A., Takeuchi, H., Takeda, K.: A Text-Mining System for Knowledge Discovery from Biomedical Documents. IBM System Journal 43(3), 516–533 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research, Tokyo Research Laboratory, IBM Japan, Ltd., Shimotsuruma, Yamato-shi Kanagawa, 1623-14, Japan
Hironori Takeuchi, Issei Yoshida & Yohei Ikawa
Research Institute of Bio-system Informatics, Tohoku Chemical Co., Ltd., Odouri 3-3-10 Morioka-shi Iwate, Japan
Kazuo Iida & Yoko Fukui

Authors

Hironori Takeuchi
View author publications
You can also search for this author in PubMed Google Scholar
Issei Yoshida
View author publications
You can also search for this author in PubMed Google Scholar
Yohei Ikawa
View author publications
You can also search for this author in PubMed Google Scholar
Kazuo Iida
View author publications
You can also search for this author in PubMed Google Scholar
Yoko Fukui
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Brain Tumor Research Program, Children’s Memorial Hospital, and Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Eric G. Bremer
Computer Science Department, Knowledge Management in Bioinformatics, Humbold-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Jörg Hakenberg
iXmatch Inc., 5555 West 78th Street Suite E, 55439-2702, Minneapolis, MN, USA
Eui-Hong (Sam) Han
School of Biomedical Sciences, University of Ulster, Cromore Road,, BT52 1SA, Coleraine, Northern Ireland, UK
Daniel Berrar
School of Biomedial Sciences, Bioinformatics Research Group, University of Ulster, Cromore Road, BT52 1SA, Coleraine, Northern Ireland, UK
Werner Dubitzky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takeuchi, H., Yoshida, I., Ikawa, Y., Iida, K., Fukui, Y. (2006). Detecting Invalid Dictionary Entries for Biomedical Text Mining. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_10

Download citation

DOI: https://doi.org/10.1007/11683568_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32809-4
Online ISBN: 978-3-540-32810-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics