Skip to main content

Tagging Medical Documents with High Accuracy

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3157))

Abstract

We ran both Brill’s rule-based tagger and TnT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TnT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TnT on a large annotated medical text corpus, with a slightly extended tagset that captures certain medical language particularities, and achieved 98% tagging accuracy. Hence, statistical off-the-shelf POS taggers cannot only be immediately reused for medical NLP, but they also achieve – when trained on medical corpora – a higher performance level than for the newspaper genre.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Friedman, C., Hripcsak, G.: Natural language processing and its future in medicine. Academic Medicine 74, 890–895 (1999)

    Article  Google Scholar 

  2. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The PENN TREEBANK. Computational Linguistics 19, 313–330 (1993)

    Google Scholar 

  3. Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21, 543–565 (1995)

    Google Scholar 

  4. Brants, T.: TNT: A statistical part-of-speech tagger. In: Proceedings of the 6th Conference on Applied NLP, Seattle, WA, pp. 224–231 (2000)

    Google Scholar 

  5. Campbell, D.A., Johnson, S.B.: Comparing syntactic complexity in medical and non-medical corpora. In: Proceedings of the Annual Symposium of the American Medical Informatics Association – AMIA, Washington, D.C, pp. 90–94 (2001)

    Google Scholar 

  6. Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proc. 5th Conference on Applied NLP, Washington, D.C, pp. 88–95 (1997)

    Google Scholar 

  7. Kilgarriff, A.: Comparing corpora. Intl. Journal of Corpus Linguistics 6, 97–133 (2001)

    Article  Google Scholar 

  8. Wermter, J., Hahn, U.: An annotated German-language medical text corpus as language resource. In: Proceedings 4th International LREC Conference, Lisbon, Portugal (2004)

    Google Scholar 

  9. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in NLP, Philadelphia, PA, pp. 133–142 (1996)

    Google Scholar 

  10. Giménez, J., Màrquez, L.: Fast and accurate part-of-speech tagging: The SVM approach revisited. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing – RANLP 2003, Borovets, Bulgaria (2003)

    Google Scholar 

  11. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the HLT and the 3rd Conference of the North American Chapter of the ACL, Edmonton, Canada, pp. 252–259 (2003)

    Google Scholar 

  12. Samuelsson, C., Voutilainen, A.: Comparing a linguistic and a stochastic tagger. In: Proceedings of the 35th Annual Meeting of the ACL & 8th Conference of the European Chapter of the ACL, Madrid, Spain, pp. 246–253 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hahn, U., Wermter, J. (2004). Tagging Medical Documents with High Accuracy. In: Zhang, C., W. Guesgen, H., Yeap, WK. (eds) PRICAI 2004: Trends in Artificial Intelligence. PRICAI 2004. Lecture Notes in Computer Science(), vol 3157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28633-2_90

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28633-2_90

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22817-2

  • Online ISBN: 978-3-540-28633-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics