Abstract
Searching specialized collections, such as biomedical literature, typically requires intimate knowledge of a specialized terminology. Hence, it can be a disappointing experience: not knowing the right terms to use and being unaware of synonyms or variations in terminology might result in low recall scores. We study the role of a thesaurus in the biomedical information retrieval process. We start by giving a description of vocabulary mismatch problems between natural language queries and relevant documents in biomedical literature search; we provide a detailed case study and observe the impact of vocabulary mismatch problems on retrieval effectiveness. Additionally, we analyze the associated MeSH thesaurus terms used to index the documents in the collection. Based on our observations, we propose a method for exploiting the MeSH thesaurus to improve retrieval effectiveness and, more specifically, to increase recall. We carry out a series of thesaurus-based retrieval experiments that show substantial performance improvements. We conclude with a detailed analysis of the retrieval results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ceusters, W., Smith, B., Goldberg, L.: A terminological and ontological analysis of the NCI thesaurus. Methods of Information in Medicine (2005) (in press)
Ceusters, W., Smith, B., Kuman, A., Dhaen, C.: Mistakes in medical ontologies: Where do they come from and how can they be detected? In: Ontologies in Medicine: Proceedings of the Workshop on Medical Ontologies. IOS Press, Amsterdam (2003)
Cleverdon, C.W.: Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. Technical report, College of Aeronautics, Cranfield UK (1962)
Cleverdon, C.W.: The Cranfield tests on index language devices. Aslib 19, 173–192 (1967)
Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: Harman, D.K. (ed.) The Second Text REtrieval Conference (TREC-2). National Institute for Standards and Technology. NIST Special Publication 500-215, pp. 243–252 (1994)
French, J.C., Powell, A.L., Gey, F., Perelman, N.: Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness. In: CIKM 2001: Proceedings of the tenth international conference on Information and knowledge management, pp. 199–206. ACM Press, New York (2001)
Grabar, N., Zweigenbaum, P., Soualmia, L., Darmoni, S.: Matching controlled vocabulary words. In: Surjan, G., Engelbrecht, R., McNair, P. (eds.) Proceedings of MIE 2003, Eighteenth International Congress of the European Federation for Medical Informatics. IOS Press Publisher, Amsterdam (2003)
Hersh, W., Bhuptiraju, R.T., Ross, L., Johnson, P., Cohen, A., Kraemer, D.: Trec 2004 genomics track overview. In: The Thirteenth Text Retrieval Conference: TREC 2004, Gaithersburg, MD, National Institute of Standards and Technology (2004)
Hersh, W., Price, S., Donohoe, L.: Assessing thesaurus-based query expansion using the UMLS metathesaurus. In: Proc. of the 2000 American Medical Informatics Association (AMIA) Symposium, pp. 344–348 (2000)
Iivonen, M.: Consistency in the selection of search concepts and search terms. Information Processing and Management 31, 173–190 (1995)
Kamps, J.: Improving retrieval effectiveness by reranking documents based on controlled vocabulary. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 283–295. Springer, Heidelberg (2004)
Kraaij, W., Weeber, M., Raaijmakers, S., Jelier, R.: MeSH based feedback, concept recognition and stacked classification for curation tasks. In: Proceedings of TREC 2004, NIST (2005)
Lancaster, F.W.: Vocabulary Control for Information Retrieval, 2nd edn. Information Resources Press, Arlington (1986)
National Library of Medicine. Medical Literature Analysis and Retrieval System Online (MEDLINE) (May 2005), http://www.nlm.nih.gov/pubs/factsheets/medline.html
National Library of Medicine. Medical Subject Headings (MeSH) (May 2005), http://www.nlm.nih.gov/mesh/
National Library of Medicine. Unified Medical Language System (UMLS) (May 2005), http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html
Paralic, J., Kostial, I.: Ontology-based information retrieval. In: Proceedings of the 14th Int. Conference on Information and Intelligent Systems - iis 2003, pp. 23–28 (2003)
Saracevic, T., Kantor, P.B.: A study of information seeking and retrieving. III. searchers, searches, overlap. Journal of the American Society for Information Science and Technology 39, 197–216 (1988)
Savoy, J.: Bibliographic database access using free-text and controlled vocabulary: an evaluation. Information Processing and Management 41, 873–890 (2005)
Srinivasan, P.: Query expansion and MEDLINE. Information Processing and Management 32(4), 431–443 (1996)
Svenonius, E.: Unanswered questions in the design of controlled vocabularies. Journals of the American Society for Information Science 37, 331–340 (1986)
TREC Genomics Track. TREC Genomics Track (May 2005), http://ir.ohsu.edu/genomics/
Voorhees, E.M.: Using WordNet to disambiguate word senses for text retrieval. In: SIGIR 1993: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 171–180. ACM Press, New York (1993)
Wilbur, J.: Non-parametric significance tests of retrieval performance comparisons. Journal of Information Science 20, 270–284 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
IJzereef, L., Kamps, J., de Rijke, M. (2005). Biomedical Retrieval: How Can a Thesaurus Help?. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. OTM 2005. Lecture Notes in Computer Science, vol 3761. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575801_31
Download citation
DOI: https://doi.org/10.1007/11575801_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29738-3
Online ISBN: 978-3-540-32120-0
eBook Packages: Computer ScienceComputer Science (R0)