Abstract
Although using domain specific knowledge sources for information retrieval yields more accurate results compared to pure keyword-based methods, more improvements can be achieved by considering both relations between concepts in an ontology and also their statistical dependencies over the corpus. In this paper, an innovative approach named concept-based pseudo-relevance feedback is introduced for improving accuracy of biomedical retrieval systems. Proposed method uses a hybrid retrieval algorithm for discovering relevancy between queries and documents which is based on a combination of keyword- and concept-based approaches. It also uses a pseudo-relevance feedback mechanism for expanding initial queries with auxiliary biomedical concepts extracted from top-ranked results of hybrid information retrieval. Using concept-based similarities makes it possible for the system to detect related documents to users’ queries, which are semantically close to each other while not necessarily sharing common keywords. In addition, expanding initial queries with concepts introduced by pseudo-relevance feedback captures those relations between queries and documents, which rely on statistical dependencies between concepts they contain. As a matter of fact, these relations may remain undetected, examining merely existing links between concepts in an external knowledge source. Proposed approach is evaluated using OHSUMED test collection and standard evaluation methods from text retrieval conference (TREC). Experimental results on MEDLINE documents (in OHSUMED collection) show 21% improvement over keyword-based approach in terms of mean average precision, which is a noticeable gain.
Similar content being viewed by others
References
Aronson A, Rindflesch T (1997) Query expansion using the UMLS metathesaurus. In: Proceedings of AMIA annual fall symposium, pp 485–489
Billerbeck B, Zobel J (2004) Techniques for efficient query expansion. In: String processing and information retrieval. 11th international conference, SPIRE 2004, pp 30–42
Cui H, Nie JJ (2003) Query expansion by Mining user logs. IEEE Trans Knowl Data Eng 15(4): 829–839
Gauch S, Wang J, Rachaconda SM (1999) A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Trans Inf Syst (TOIS) 17(3): 250–269
Hersh W, Buckley C, Leone T et al (1994) OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: International ACM SIGIR conference on research and development in information retrieval, vol 17, pp 192–201
Hersh W, Price S, Donohoe L (2000) Assessing thesaurus-based query expansion using the UMLS metathesaurus. J Am Med Inform Assoc Annu Symp 2000 344–348
Houston AL, Chen HC (2000) Exploring the use of concept spaces to improve medical information retrieval. Decis Support Syst 30(2): 171–186
Jalali V, Borujerdi MRM (2008) The effect of using domain specific ontologies in query expansion in medical field. In: International conference on innovations in information technology (IIT2008), pp 277–281
Jurisica I, Mylopoulos J, Yu E (2004) Ontologies for knowledge management: an information systems perspective. Knowl Inf Syst 6(4): 380–401
Leroy G, Chen HC (2001) Meeting medical terminology needs: the ontology-enhanced medical concept mapper. IEEE Trans Inf Technol Biomed 5(4): 261–270
Li Y, Bandar Z, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4): 871–882
Liu Z, Chu W (2007) Improving keyword based web image search with visual feature distribution and term expansion. Knowl Inf Syst 10(2): 173–202
Lucene http://lucene.apache.org/
Mandala R, Tokunaga T, Tanaka H (1999) Combining multiple evidence from different types of thesaurus for query expansion. In: SIGIR 99: proceedings of the 22nd annual international ACM SIGIR, pp 191–197
Mao W, Chu W (2002) Free-text medical document retrieval via phrase-based vector space model. In: Proceedings of AMIA annual symposium, pp 489–493
Medical subject headings (MeSH) http://www.nlm.nih.gov/mesh/
Moskovitch R, Susana B, Eytan B et al (2007) A comparative evaluation of full-text, concept-based, and context-sensitive search. J Am Med Inform Assoc 14(2): 164–174
Rada R, Bicknell E (1989) Ranking documents with a thesaurus. J Am Soc Inf Sci 40: 304–310
Song W, Park SC (2009) Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering. Knowl Inf Syst. doi:10.1007/s10115-009-0191-5
Srinivasan P (1996) Query expansion and MEDLINE. Inf Process Manag 32(4): 431–443
trec_eval http://trec.nist.gov/trec_eval/
Voorhees EM (1994) Query expansion using lexical-semantic relations. In: SIGIR94, pp 61–69
Wan X (2008) Beyond topical similarity: a structural similarity measure for retrieving highly similar documents. Knowl Inf Syst 15(1): 55–73
Wang P, Hu J, Zeng HJ, et al (2009) UsingWikipedia knowledge to improve text classification. Knowl Inf Syst. doi:10.1007/s10115-008-0152-4
Yoo S, Choi J (2007) Improving MEDLINE document retrieval using automatic query expansion. Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers 4822: 241–249
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jalali, V., Matash Borujerdi, M.R. Information retrieval with concept-based pseudo-relevance feedback in MEDLINE. Knowl Inf Syst 29, 237–248 (2011). https://doi.org/10.1007/s10115-010-0327-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0327-7