Abstract
This paper tackles the problem of term ambiguity, especially for biomedical literature. We propose and evaluate two methods of Word Sense Disambiguation (WSD) for biomedical terms and integrate them to a sense-based document indexing and retrieval framework. Ambiguous biomedical terms in documents and queries are disambiguated using the Medical Subject Headings (MeSH) thesaurus and semantically indexed with their associated correct sense. The experimental evaluation carried out on the TREC9-FT 2000 collection shows that our approach of WSD and sense-based indexing and retrieval outperforms the baseline.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: SIGDOC 1986, pp. 24–26 (1986)
Gale, W., Church, K., Yarowsky, D.: A method for disambiguating word senses in a large corpus. Computers and the Humanities, 415–439 (1993)
Mihalcea, R.: Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. In: HLT 2005, pp. 411–418 (2005)
Lee, Y.K., Ng, H.T., Chia, T.K.: Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In: Senseval-3: Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 137–140 (2004)
Liu, H., Teller, V., Friedman, C.: A multi-aspect comparison study of supervised word sense disambiguation. J Am. Med. Inform. Assoc. 11(4), 320–331 (2004)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: ACL 1995, pp. 189–196 (1995)
Abney, S.P.: Bootstrapping. In: ACL, pp. 360–367 (2002)
Leroy, G., et al.: Effects of information and machine learning algorithms on word sense disambiguation with small datasets. Medical Informatics, 573–585 (2005)
Joshi, M., Pedersen, T., Maclin, R.: A comparative study of support vector machines applied to the word sense disambiguation problem for the medical domain. In: IICAI 2005, pp. 3449–3468 (2005)
Weeber, M., Mork, J., Aronson, A.: Developing a test collection for biomedical word sense disambiguation. In: Proc. AMIA Symp., pp. 746–750 (2001)
Humphrey, S.M., Rogers, W.J., et al.: Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: Preliminary experiment. J. Am. Soc. Inf. Sci. Technol. 57(1), 96–113 (2006)
Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving abbreviations to their senses in medline. Bioinformatics 21(18), 3658–3664 (2005)
Andreopoulos, B., Alexopoulou, D., Schroeder, M.: Word sense disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering. IJDMB 2(3), 193–215 (2008)
Mohammad, S., Pedersen, T.: Combining lexical and syntactic features for supervised word sense disambiguation. In: CoNLL 2004, pp. 25–32 (2004)
Stevenson, M., Guo, Y., Gaizauskas, R., Martinez, D.: Knowledge sources for word sense disambiguation of biomedical text. In: BioNLP 2008, pp. 80–87 (2008)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the metamap program. In: Proceedings AMIA Symposium, pp. 17–21 (2001)
Schmid, H.: Part-of-speech tagging with neural networks. In: Proceedings of the 15th conference on Computational linguistics, pp. 172–176 (1994)
Gale, W.A., Church, K.W., Yarowsky, D.: One sense per discourse. In: HLT 1991: Proceedings of the workshop on Speech and natural Language, pp. 233–237 (1992)
Leacock, C., Chodorow, M.: Combining local context and wordnet similarity for word sense identification. An Electronic Lexical Database, 265–283 (1998)
Kang, B.Y., Kim, D.W., Lee, S.J.: Exploiting concept clusters for content-based information retrieval. Information Sciences - Informatics and Computer Science 170(2-4), 443–462 (2005)
Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: Ohsumed: an interactive retrieval evaluation and new large test collection for research. In: SIGIR 1994, pp. 192–201 (1994)
Robertson, S.E., Walker, S., Hancock-Beaulieu, M.: Okapi at trec-7: Automatic ad hoc, filtering, vlc and interactive. In: TREC, pp. 199–210 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dinh, D., Tamine, L. (2010). Sense-Based Biomedical Indexing and Retrieval. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds) Natural Language Processing and Information Systems. NLDB 2010. Lecture Notes in Computer Science, vol 6177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13881-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-13881-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13880-5
Online ISBN: 978-3-642-13881-2
eBook Packages: Computer ScienceComputer Science (R0)