Skip to main content

Evaluation of Disambiguation Strategies on Biomedical Text Categorization

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9656))

Included in the following conference series:

Abstract

A common and ordinary way of representing a text is as a Bag of its component Words BoW. This Representation suffers from the lack of sense in resulting representations ignoring all semantics that reside in the original text, instead of, the Conceptualization using background knowledge enriches document representation models. While searching polysemic term corresponding senses in semantic resources, multiple matches are detected then introduce some ambiguities in the final document representation, three strategies for Disambiguation can be used: First Concept, All Concepts and Context-Based. SenseRelate is a well-known Context-Based algorithm, which use a fixed window size and taking into consideration the distance weight on how far the terms in the context are from the target word. This may impact negatively on the yielded concepts or senses.

To overcome this problem, and therefore to enhance the process of Biomedical WSD, in this paper we propose a simple modified versions of SenseRelate algorithm named NoDistanceSenseRelate which simply ignore the distance, that is the terms in the context will have the same distance weight.

To illustrate the efficiency of both SenseRelate algorithm and NoDistanceSenseRelate one over the others methods, in this study, several experiments have been conducted using OHSUMED corpus. The obtained results using Biomedical Text Categorization system based on three machine learning models: Support Vector Machine (SVM), Naïve Bayes (NB) and Maximum Entropy (ME) show that the Context-Based methods (SenseRelate and NoDistanceSenseRelate) outperform the others ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Elberrichi, Z., Taibi, M., Belaggoun, A.: Multilingual Medical Documents Classification Based on MesH Domain Ontology. CoRR abs/1206.4883 (2012)

    Google Scholar 

  2. Amine, A., Elberrichi, Z., Simonet, M.: Evaluation of text clustering methods using WordNet. Int. Arab J. Inf. Technol. 7, 351 (2010)

    Google Scholar 

  3. Guyot, J., Radhoum, S., Falquet, G.: Ontology-based multilingual information retrieval. In: CLEF (2005)

    Google Scholar 

  4. Litvak, M., Last, M., Kisilevich, S.: Improving classification of multi-lingual web documents using domain ontologies. In: The Second International Workshop on Knowledge Discovery and Ontologies, KDO05, Porto, Portugal, October 7th 2006

    Google Scholar 

  5. Sanchez, D., Moreno, A.: Creating ontologies from Web documents. In: Recent Advances in Artificial Intelligence Research and Development, vol. 113, pp. 11–18. IOS Press (2004)

    Google Scholar 

  6. Song, M.-H., Lim, S.-Y., Park, S.-B., Kang, D.-J., Lee, S.-J.: An automatic approach to classify web documents using a domain ontology. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 666–671. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Albitar, S., Fournier, S., Espinasse, B.: The Impact of Conceptualization on Text Classification

    Google Scholar 

  8. Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 241–57 (2003)

    Google Scholar 

  9. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–6 (1986)

    Google Scholar 

  10. McInnes, B.T., Pedersen, T., Pakhomov, S.V.S., Liu, Y., Melton-Meaux, G.: UMLS: Similarity: Measuring the relatedness and similarity of biomedical concepts. In: HLT-NAACL, pp. 28–31 (2013)

    Google Scholar 

  11. Jimeno-Yepes, A., McInnes, B., Aronson, A.: An unsupervised vector approach to biomedical term disambiguation: Integrating umls and medline. BMC Bioinform. 12(1), 223 (2011)

    Article  Google Scholar 

  12. Hersh, W., et al.: OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192–201. New York, Inc., Dublin (1994)

    Google Scholar 

  13. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Rais .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Rais, M., Lachkar, A. (2016). Evaluation of Disambiguation Strategies on Biomedical Text Categorization. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2016. Lecture Notes in Computer Science(), vol 9656. Springer, Cham. https://doi.org/10.1007/978-3-319-31744-1_68

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31744-1_68

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31743-4

  • Online ISBN: 978-3-319-31744-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics