Skip to main content
Log in

Information retrieval with concept-based pseudo-relevance feedback in MEDLINE

  • Short Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Although using domain specific knowledge sources for information retrieval yields more accurate results compared to pure keyword-based methods, more improvements can be achieved by considering both relations between concepts in an ontology and also their statistical dependencies over the corpus. In this paper, an innovative approach named concept-based pseudo-relevance feedback is introduced for improving accuracy of biomedical retrieval systems. Proposed method uses a hybrid retrieval algorithm for discovering relevancy between queries and documents which is based on a combination of keyword- and concept-based approaches. It also uses a pseudo-relevance feedback mechanism for expanding initial queries with auxiliary biomedical concepts extracted from top-ranked results of hybrid information retrieval. Using concept-based similarities makes it possible for the system to detect related documents to users’ queries, which are semantically close to each other while not necessarily sharing common keywords. In addition, expanding initial queries with concepts introduced by pseudo-relevance feedback captures those relations between queries and documents, which rely on statistical dependencies between concepts they contain. As a matter of fact, these relations may remain undetected, examining merely existing links between concepts in an external knowledge source. Proposed approach is evaluated using OHSUMED test collection and standard evaluation methods from text retrieval conference (TREC). Experimental results on MEDLINE documents (in OHSUMED collection) show 21% improvement over keyword-based approach in terms of mean average precision, which is a noticeable gain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aronson A, Rindflesch T (1997) Query expansion using the UMLS metathesaurus. In: Proceedings of AMIA annual fall symposium, pp 485–489

  2. Billerbeck B, Zobel J (2004) Techniques for efficient query expansion. In: String processing and information retrieval. 11th international conference, SPIRE 2004, pp 30–42

  3. Cui H, Nie JJ (2003) Query expansion by Mining user logs. IEEE Trans Knowl Data Eng 15(4): 829–839

    Article  Google Scholar 

  4. Gauch S, Wang J, Rachaconda SM (1999) A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Trans Inf Syst (TOIS) 17(3): 250–269

    Article  Google Scholar 

  5. Hersh W, Buckley C, Leone T et al (1994) OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: International ACM SIGIR conference on research and development in information retrieval, vol 17, pp 192–201

  6. Hersh W, Price S, Donohoe L (2000) Assessing thesaurus-based query expansion using the UMLS metathesaurus. J Am Med Inform Assoc Annu Symp 2000 344–348

  7. Houston AL, Chen HC (2000) Exploring the use of concept spaces to improve medical information retrieval. Decis Support Syst 30(2): 171–186

    Article  Google Scholar 

  8. Jalali V, Borujerdi MRM (2008) The effect of using domain specific ontologies in query expansion in medical field. In: International conference on innovations in information technology (IIT2008), pp 277–281

  9. Jurisica I, Mylopoulos J, Yu E (2004) Ontologies for knowledge management: an information systems perspective. Knowl Inf Syst 6(4): 380–401

    Article  Google Scholar 

  10. Leroy G, Chen HC (2001) Meeting medical terminology needs: the ontology-enhanced medical concept mapper. IEEE Trans Inf Technol Biomed 5(4): 261–270

    Article  Google Scholar 

  11. Li Y, Bandar Z, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4): 871–882

    Article  Google Scholar 

  12. Liu Z, Chu W (2007) Improving keyword based web image search with visual feature distribution and term expansion. Knowl Inf Syst 10(2): 173–202

    MATH  Google Scholar 

  13. Lucene http://lucene.apache.org/

  14. Mandala R, Tokunaga T, Tanaka H (1999) Combining multiple evidence from different types of thesaurus for query expansion. In: SIGIR 99: proceedings of the 22nd annual international ACM SIGIR, pp 191–197

  15. Mao W, Chu W (2002) Free-text medical document retrieval via phrase-based vector space model. In: Proceedings of AMIA annual symposium, pp 489–493

  16. Medical subject headings (MeSH) http://www.nlm.nih.gov/mesh/

  17. Moskovitch R, Susana B, Eytan B et al (2007) A comparative evaluation of full-text, concept-based, and context-sensitive search. J Am Med Inform Assoc 14(2): 164–174

    Article  Google Scholar 

  18. Rada R, Bicknell E (1989) Ranking documents with a thesaurus. J Am Soc Inf Sci 40: 304–310

    Article  Google Scholar 

  19. Song W, Park SC (2009) Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering. Knowl Inf Syst. doi:10.1007/s10115-009-0191-5

  20. Srinivasan P (1996) Query expansion and MEDLINE. Inf Process Manag 32(4): 431–443

    Article  Google Scholar 

  21. trec_eval http://trec.nist.gov/trec_eval/

  22. Voorhees EM (1994) Query expansion using lexical-semantic relations. In: SIGIR94, pp 61–69

  23. Wan X (2008) Beyond topical similarity: a structural similarity measure for retrieving highly similar documents. Knowl Inf Syst 15(1): 55–73

    Article  Google Scholar 

  24. Wang P, Hu J, Zeng HJ, et al (2009) UsingWikipedia knowledge to improve text classification. Knowl Inf Syst. doi:10.1007/s10115-008-0152-4

  25. Yoo S, Choi J (2007) Improving MEDLINE document retrieval using automatic query expansion. Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers 4822: 241–249

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vahid Jalali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jalali, V., Matash Borujerdi, M.R. Information retrieval with concept-based pseudo-relevance feedback in MEDLINE. Knowl Inf Syst 29, 237–248 (2011). https://doi.org/10.1007/s10115-010-0327-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0327-7

Keywords

Navigation