Skip to main content

Detecting Invalid Dictionary Entries for Biomedical Text Mining

  • Conference paper
  • 467 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3886))

Abstract

In text mining, to calculate precise keyword frequency distributions in a particular document collection, we need to map different keywords that denote the same entity to a canonical form. In the life science domain, we can construct a large dictionary that contains the canonical forms and their variants based on the information from external resources and use this dictionary for the term aggregation. However, in this automatically generated dictionary, there are many invalid entries that have negative effects on the calculations of keyword frequencies. In this paper, we propose and test methods to detect invalid entries in the dictionary.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., et al.: SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365–370 (2003)

    Article  Google Scholar 

  2. Humphrey, B.L., Schoolman, H.M.: The Unified Medical Language System: An Informatics Research Collaboration. Journal of the American Medical Informatics Association 5(1), 1–11 (1998)

    Article  Google Scholar 

  3. Koike, A., Takagi, T.: Gene/ Protein/ Family Name Recognition in Biomedical Literature. In: HLT-NAACL 2004 Workshop: BioLink 2004, Linking Biological Literature, Ontologies and Databases, pp. 9–16 (2004)

    Google Scholar 

  4. Krauthammer, M., Nenadic, G.: Term Identification in the Biomedical Literature. Journal of Biomedical Informatics 37(6), 512–526 (2004)

    Article  Google Scholar 

  5. Liu, H., Hu, Z., Zhang, J., Wu, C.: BioThesaurus: A Web-Based Thesaurus of Protein and Gene Names. Bioinformatics 22(1), 103–105 (2006)

    Article  Google Scholar 

  6. Nasukawa, T., Nagano, T.: Text analysis and knowledge mining system. IBM System Journal 40(4), 967–984 (2001)

    Article  Google Scholar 

  7. Pruitt, K.D., Maglott, D.R.: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Research 29(1), 137–140 (2001)

    Article  Google Scholar 

  8. Schwartz, A.S., Hearst, M.A.: A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text. In: Proceeding of the Pac. Symp. Biocomput., pp. 451–462 (2003)

    Google Scholar 

  9. Shatkay, H., Feldman, R.: Mining the Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology 10(6), 821–855 (2003)

    Article  Google Scholar 

  10. Tsuruoka, Y., Tsujii, J.: Probabilistic Term Variant Generator for Biomedical Terms. In: Proceeding of the SIGIR 2003, pp. 167–173 (2003)

    Google Scholar 

  11. Tuason, O., Chen, L., Liu, H., Blake, J.A., Friedman, C.: Biological nomenclatures: a source of lexical knowledge and ambiguity. In: Proceeding of the Pac. Symp. Biocomput., pp. 238–249 (2004)

    Google Scholar 

  12. Uramoto, N., Matsuzawa, H., Nagano, T., Murakami, A., Takeuchi, H., Takeda, K.: A Text-Mining System for Knowledge Discovery from Biomedical Documents. IBM System Journal 43(3), 516–533 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takeuchi, H., Yoshida, I., Ikawa, Y., Iida, K., Fukui, Y. (2006). Detecting Invalid Dictionary Entries for Biomedical Text Mining. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_10

Download citation

  • DOI: https://doi.org/10.1007/11683568_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32809-4

  • Online ISBN: 978-3-540-32810-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics