Tagging Medical Documents with High Accuracy

Hahn, Udo; Wermter, Joachim

doi:10.1007/978-3-540-28633-2_90

Tagging Medical Documents with High Accuracy

Udo Hahn²¹ &
Joachim Wermter²¹

Conference paper

1325 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3157))

Abstract

We ran both Brill’s rule-based tagger and TnT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TnT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TnT on a large annotated medical text corpus, with a slightly extended tagset that captures certain medical language particularities, and achieved 98% tagging accuracy. Hence, statistical off-the-shelf POS taggers cannot only be immediately reused for medical NLP, but they also achieve – when trained on medical corpora – a higher performance level than for the newspaper genre.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Friedman, C., Hripcsak, G.: Natural language processing and its future in medicine. Academic Medicine 74, 890–895 (1999)
Article Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The PENN TREEBANK. Computational Linguistics 19, 313–330 (1993)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21, 543–565 (1995)
Google Scholar
Brants, T.: TNT: A statistical part-of-speech tagger. In: Proceedings of the 6th Conference on Applied NLP, Seattle, WA, pp. 224–231 (2000)
Google Scholar
Campbell, D.A., Johnson, S.B.: Comparing syntactic complexity in medical and non-medical corpora. In: Proceedings of the Annual Symposium of the American Medical Informatics Association – AMIA, Washington, D.C, pp. 90–94 (2001)
Google Scholar
Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proc. 5th Conference on Applied NLP, Washington, D.C, pp. 88–95 (1997)
Google Scholar
Kilgarriff, A.: Comparing corpora. Intl. Journal of Corpus Linguistics 6, 97–133 (2001)
Article Google Scholar
Wermter, J., Hahn, U.: An annotated German-language medical text corpus as language resource. In: Proceedings 4th International LREC Conference, Lisbon, Portugal (2004)
Google Scholar
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in NLP, Philadelphia, PA, pp. 133–142 (1996)
Google Scholar
Giménez, J., Màrquez, L.: Fast and accurate part-of-speech tagging: The SVM approach revisited. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing – RANLP 2003, Borovets, Bulgaria (2003)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the HLT and the 3rd Conference of the North American Chapter of the ACL, Edmonton, Canada, pp. 252–259 (2003)
Google Scholar
Samuelsson, C., Voutilainen, A.: Comparing a linguistic and a stochastic tagger. In: Proceedings of the 35th Annual Meeting of the ACL & 8th Conference of the European Chapter of the ACL, Madrid, Spain, pp. 246–253 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Text Knowledge Engineering Lab, Friedrich-Schiller-Universität Jena, Fürstengraben 30, D-07743, Jena, Germany
Udo Hahn & Joachim Wermter

Authors

Udo Hahn
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Wermter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Engineering and Information Technology, Centre for Quantum Computation and Intelligent Systems, and Australian ACS National Committee for Artificial Intelligence, University of Technology, Sydney, Australia
Chengqi Zhang
Department of Computer Science, Auckland University of Technology, 1020, Auckland, New Zealand
Hans W. Guesgen
Artificial Intelligence Technology Centre, Auckland University of Technology, Auckland, New Zealand
Wai-Kiang Yeap

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hahn, U., Wermter, J. (2004). Tagging Medical Documents with High Accuracy. In: Zhang, C., W. Guesgen, H., Yeap, WK. (eds) PRICAI 2004: Trends in Artificial Intelligence. PRICAI 2004. Lecture Notes in Computer Science(), vol 3157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28633-2_90

Download citation

DOI: https://doi.org/10.1007/978-3-540-28633-2_90
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22817-2
Online ISBN: 978-3-540-28633-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics