Skip to main content

Headwords and Suffixes in Biomedical Names

  • Conference paper
Knowledge Discovery in Life Science Literature (KDLL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3886))

  • 483 Accesses

Abstract

Natural Language Processing (NLP) techniques have been used for the task of extracting and mining knowledge from biomedical literature. One of the critical steps of such a task is biomedical named entity tagging (BNER) which usually contains two steps: the first step is the identification of biomedical names in text and the second is the assignment of semantic classes predefined to names identified by the first step. Headwords and suffixes have been used frequently by BNER systems as features for the assignment of semantic classes to names in text. However, there are few studies to evaluate the performance of headwords and suffixes in predicting semantic classes of biomedical terms utilizing knowledge sources in an unsupervised way. We conducted a study to evaluate the performance of headwords and suffixes using names in the Unified Medical Language System (UMLS) where the semantic classes associated with these names were obtained by modifying an existing UMLS semantic group system and incorporating the GENIA ontology. We define headwords and suffixes that are significantly associated with a specific semantic class as semantic suffixes. The performance of semantic assignment using semantic suffixes achieved an F-measure of 86.4% with a precision of 91.6% and a recall of 81.7%. When applying these semantic suffixes obtained using the UMLS to names extracted from the GENIA corpus, the system achieved an F-measure of 73.4% with a precision of 84.2% and a recall of 65.1% where these performance measures could be improved dramatically when limited to names associated with classes that have the corresponding GENIA types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H.: Accomplishments and challenges in literature data mining for biology. Bioinformatics 18(12), 1553–1561 (2002)

    Article  Google Scholar 

  2. Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6(suppl. 1), S1 (2005)

    Google Scholar 

  3. Shatkay, H., Feldman, R.: Mining the biomedical literature in the genomic era: an overview. J. Comput. Biol. 10(6), 821–855 (2003)

    Article  Google Scholar 

  4. Krauthammer, M., Nenadic, G.: Term identification in the biomedical literature. J. Biomed. Inform. 37(6), 512–526 (2004)

    Article  Google Scholar 

  5. Gaizauskas, R., Demetriou, G., Artymiuk, P.J., Willett, P.: Protein structures and information extraction from biological texts: the PASTA system. Bioinformatics 19(1), 135–143 (2003)

    Article  Google Scholar 

  6. Lee, K.J., Hwang, Y.S., Kim, S., Rim, H.C.: Biomedical named entity recognition using two-phase model based on SVMs. J. Biomed. Inform. 37(6), 436–447 (2004)

    Article  Google Scholar 

  7. Torii, M., Kamboj, S., Vijay-Shanker, K.: Using name-internal and contextual features to classify biological terms. J. Biomed. Inform. 37(6), 498–511 (2004)

    Article  Google Scholar 

  8. Nenadic, G., Spasic, I., Ananiadou, S.: Terminology-driven mining of biomedical literature. Bioinformatics 19(8), 938–943 (2003)

    Article  Google Scholar 

  9. Narayanaswamy, M., Ravikumar, K.E., Vijay-Shanker, K.: A biological named entity recognizer. Pac. Symp. Biocomput., 427–438 (2003)

    Google Scholar 

  10. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267–270 (2004)

    Article  Google Scholar 

  11. Johnson, S.B.: A semantic lexicon for medical language processing. J. Am. Med. Inform. Assoc. 6(3), 205–218 (1999)

    Article  Google Scholar 

  12. Friedman, C., Liu, H., Shagina, L., Johnson, S., Hripcsak, G.: Evaluating the UMLS as a source of lexical knowledge for medical language processing. Proc AMIA Symp., 189–193 (2001)

    Google Scholar 

  13. Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc. 1(2), 161–174 (1994)

    Article  Google Scholar 

  14. McCray, A.T., Burgun, A., Bodenreider, O.: Aggregating UMLS semantic types for reducing conceptual complexity. Medinfo 10(Pt 1), 216–220 (2001)

    Google Scholar 

  15. Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: GENIA corpus–semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl. 1), i180–182 (2003)

    Article  Google Scholar 

  16. Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C.: Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20(7), 1178–1190 (2004)

    Article  Google Scholar 

  17. Tsuruoka, Y., Tsujii, J.: Improving the performance of dictionary-based approaches in protein name recognition. J. Biomed. Inform. 37(6), 461–470 (2004)

    Article  Google Scholar 

  18. Torii, M., Vijay-Shanker, K.: Using Unlabeled MEDLINE Abstracts for Biological Named Entity Classification. In: Proceedings of Genome Informatics Workshop: 2002, pp. 567–568 (2002)

    Google Scholar 

  19. Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings of the Workshop on Very Large Cor- pora at the Conference on Empirical Methods in NLP 1999 (1999)

    Google Scholar 

  20. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Empirical Methods in Natural Language Processing and Very Large Corpora 1999 (1999)

    Google Scholar 

  21. Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning support vector machine for biomedical named entity recognition. In: Workshop on Natural Language Processing in the Biomedical Domain, ACL 2002 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Torii, M., Liu, H. (2006). Headwords and Suffixes in Biomedical Names. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_3

Download citation

  • DOI: https://doi.org/10.1007/11683568_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32809-4

  • Online ISBN: 978-3-540-32810-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics