Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval

Young-In SONG
Kyoung-Soo HAN
So-Young PARK
Sang-Bum KIM
Hae-Chang RIM

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E90-D    No.11    pp.1873-1876
Publication Date: 2007/11/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e90-d.11.1873
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Contents Technology and Web Information Systems
Keyword: 
query expansion,  biomedical terminology,  biomedical document retrieval,  biomedical terminology weighting,  

Full Text: PDF(82.6KB)>>
Buy this Article



Summary: 
In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.


open access publishing via