An Empirical Study of Word Sense Disambiguation for Biomedical Information Retrieval System

Rais, Mohammed; Lachkar, Abdelmonaime

doi:10.1007/978-3-319-78723-7_27

Mohammed Rais¹⁵ &
Abdelmonaime Lachkar¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10813))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

1767 Accesses

Abstract

Document representation is an important stage to ensure the indexation of biomedical document. The ordinary way to represent a text is a bag of words BoW, This Representation suffers from the lack of sense in resulting representations ignoring all semantics that reside in the original text; instead of, the Conceptualization using background knowledge enriches document representation models. Three strategies can be used in order to realize the conceptualization task: Adding Concept, Partial Conceptualization, and Complete Conceptualization. While searching polysemic term corresponding senses in semantic resources, multiple matches are detected then introduce some ambiguities in the final document representation, three strategies for Disambiguation can be used: First Concept, All Concepts and Context-Based. SenseRelate is a well-known Context-Based algorithm, which uses a fixed window size and taking into consideration the distance weight on how far the terms in the context are from the target word. This may impact negatively on the yielded concepts or senses, we propose a simple modified version of SenseRelate algorithm namely NoDistanceSenseRelate, which simply ignore the distance that is the terms in the context will have the same distance weight. In order to evaluate the effect of the conceptualization strategies and Disambiguation strategies in the indexing process, in this study, several experiments have been conducted using OHSUMED corpus on a biomedical information retrieval system. The obtained results using OHSUMED corpus show that the Context-Based methods (SenseRelate and NoDistanceSenseRelate) outperform the others ones when applying Adding Concept Conceptualization strategy results using Biomedical Information retrieval system. The obtained results prove the evidence of adding the sense of concepts to the Term Representation in the IR process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluation of Disambiguation Strategies on Biomedical Text Categorization

Enhancing Medical Word Sense Inventories Using Word Sense Induction: A Preliminary Study

Word Sense Disambiguation System for Information Retrieval in Telugu Language

References

Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36456-0_24
Chapter Google Scholar
Dinh, D., Tamine, L.: Sense-based biomedical indexing and retrieval. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 24–35. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13881-2_3
Chapter Google Scholar
Elberrichi, Z., Taibi, M., Belaggoun, A.: Multilingual Medical Documents Classification Based on MesH Domain Ontology. CoRR abs/1206.4883 (2012)
Google Scholar
Amine, A., Elberrichi, Z., Simonet, M.: Evaluation of text clustering methods using WordNet. Int. Arab J. Inf. Technol. 7, 351 (2010)
Google Scholar
Guyot, J., Radhoum, S., Falquet, G.: Ontology-based multilingual information retrieval. In: CLEF (2005)
Google Scholar
Litvak, M., Last, M., Kisilevich, S.: Improving classification of multilingual web documents using domain ontologies. In: KDO05, The Second International Workshop on Knowledge Discovery and Ontologies, Porto, Portugal, 7 October 2006
Google Scholar
Song, M.-H., Lim, S-Yeon, Park, S.-B., Kang, D.-J., Lee, S.-J.: An automatic approach to classify web documents using a domain ontology. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 666–671. Springer, Heidelberg (2005). https://doi.org/10.1007/11590316_107
Chapter Google Scholar
Sanderson, M.: Retrieving with good sense. Inf. Retr. 2(1), 49–69 (2000)
Article Google Scholar
Stokoe, C., Oakes, M.P., Tait, J.: Word sense disambiguation in information retrieval revisited. In: Proceedings of the 26th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 159–166 (2003)
Google Scholar
Kim, S.B., Seo, H.C., Rim, H.C.: Information retrieval using word senses: root sense tagging approach. In: Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 258–265 (2004)
Google Scholar
Fang, H.: A re-examination of query expansion using lexical resources. In: Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics: Human Language Technologies, pp. 139–147 (2008)
Google Scholar
Agirre, E., Arregi, X., Otegi, A.: Document expansion based on WordNet for robust IR. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 9–17 (2010)
Google Scholar
Majdoubi, J., Loukil, H., Tmar, M., Gargouri, F.: An approach based on language modeling for improving biomedical information retrieval. Int. J. Knowl.-based Intell. Eng. Syst. 16(4), 235–246 (2012)
Google Scholar
Albitar, S., Fournier, S., Espinasse, B.: The impact of conceptualization on text classification. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds.) WISE 2012. LNCS, vol. 7651, pp. 326–339. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35063-4_24
Chapter Google Scholar
McInnes, B.T., Pedersen, T.: Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. J. Biomed. Inform. 46(6), 1116–1124 (2013)
Article Google Scholar
Rais, M., Lachkar, A.: Evaluation of disambiguation strategies on biomedical text categorization. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 790–801. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31744-1_68
Chapter Google Scholar
Rais, M., Lachkar, A.: Biomedical word sense disambiguation context-based: improvement of SenseRelate method. In: IEEE Explore - 2016 International Conference on Information Technology for Organizations Development (IT4OD) (2016)
Google Scholar
Dittenbach, M.: Scoring and Ranking Techniques - TF-IDF Term Weighting and Cosine Similarity (2010). http://www.ir-facility.org/scoring-and-ranking-techniques-tf-idf-term-weighting-and-cosine-similarity
What does TF-IDF mean? How to Compute. Information Retrieval and Text Mining http://www.tfidf.com/
Hersh, W., et al.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192–201. New York, Inc., Dublin (1994)
Google Scholar
Voorhees, E.M., Harman, D.K.: TREC: “Experiment and Evaluation in Information Retrieval”. MIT Press, Cambridge (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

L.I.S.A, Department of Electrical and Computer Engineering, ENSA, USMBA, Fez, Morocco
Mohammed Rais & Abdelmonaime Lachkar

Authors

Mohammed Rais
View author publications
You can also search for this author in PubMed Google Scholar
Abdelmonaime Lachkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Rais .

Editor information

Editors and Affiliations

University of Granada, Granada, Spain
Ignacio Rojas
University of Granada, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rais, M., Lachkar, A. (2018). An Empirical Study of Word Sense Disambiguation for Biomedical Information Retrieval System. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10813. Springer, Cham. https://doi.org/10.1007/978-3-319-78723-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-78723-7_27
Published: 28 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78722-0
Online ISBN: 978-3-319-78723-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics