Abstract
We discuss problems inherent in domain specific tagging (biomedical domain) and their relevance to tagging issues in general. We present a novel approach to this problem which we call tagging with delayed disambiguation (TDD). This approach uses a modified, statistically-driven lexicon together with a small set of morphological, heuristic, and chunking rules which are implemented using finite state machinery. They make use of both delayed disambiguation and the concept of tag underspecification as an ordered sequence of tags.
Similar content being viewed by others
References
Collier, N., Mima, H., Lee, S., Ohta, T., Tateisi, Y., Yakushiji, A., Tsujii, J.: The GENIA project: Knowledge acquisition from biology texts. Genome Informatics 11, 448–449 (2000)
Hirschman, L., Park, J.C., Tsujii, L.W.J., Wu, C.H.: Accomplishments and challenges in literature data mining for biology. Bioinformatics Review 18(12) (2002)
Samuelsson, C., Voutilainen, A.: Comparing a linguistic and a stochastic tagger. In: Cohen, P.R., Wahlster, W. (eds.) Proceedings of the ACL 1997, Somerset, New Jersey, pp. 246–253. Association for Computational Linguistics (1997)
Kulick, S., Bies, A., Liberman, M., Mandel, M., McDonald, R., Palmer, M., Schein, A., Ungar, L.: Integrated annotation for biomedical information extraction. In: NAACL/HLT Workshop on Linking Biological Literature, Ontologies and Databases: Tools for Users, pp. 61–68 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Castaño, J.M., Pustejovsky, J. (2006). Tagging with Delayed Disambiguation. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds) Finite-State Methods and Natural Language Processing. FSMNLP 2005. Lecture Notes in Computer Science(), vol 4002. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780885_28
Download citation
DOI: https://doi.org/10.1007/11780885_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35467-3
Online ISBN: 978-3-540-35469-7
eBook Packages: Computer ScienceComputer Science (R0)