Skip to main content

Ranking for Medical Annotation: Investigating Performance, Local Search and Homonymy Recognition

  • Conference paper
Knowledge Exploration in Life Science Informatics (KELSI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3303))

Abstract

In this paper we investigate several hypotheses concerning document relevance ranking for biological literature. More specifically, we focus on three topics: performance, risk of local searching, and homonymy recognition. Surprisingly, we find that a quite simple ranker based on the occurrence of a single word performs best. Adding this word as a new search term to each query yields results comparable to elaborate state-of-the-art approaches. The risk of our local searching approach is found to be negligible. In some cases retrieval from a large repository even yields worse results than local search on a smaller repository which only contains documents returned by the current query. The removal of automatically determined homonyms yields almost indistinguishable results to the original query, so it is not inconceivable that the problem of homonymy in biological literature has been overstated. Concluding, our investigation of three hypotheses has been useful to decide implementation issues within our research projects as well as opening interesting venues for further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Attwood, T.K., Bradley, P., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G.: The PRINTS protein fingerprint database: functional and evolutionary applications. In: Dunn, M., Jorde, L., Little, P., Subramaniam, A. (eds.) Encyclopaedia of Genomics, Proteomics and Bioinformatics (2004), www.bioinf.man.ac.uk/dbbrowser/PRINTS

  2. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement Trembl in 2003. Nucleic Acids Research 31(1), 365–370 (2003), http://www.expasy.org/sprot

    Article  Google Scholar 

  3. Dobrokhotov, P.B., Goutte, C., Veuthey, A.-L., Gaussier, E.: Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation. Bioinformatics 19, 191–194 (2003)

    Article  Google Scholar 

  4. Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  5. Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H.: Accomplishments and Challenges in Literature Data Mining for Biology. Bioinformatics Journal 18, 1553–1561 (2002)

    Article  Google Scholar 

  6. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)

    Article  MATH  Google Scholar 

  7. Marcotte, E.M., et al.: Mining literature for protein-protein interactions. Bioinformatics 17, 359–363 (2001)

    Article  Google Scholar 

  8. Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University, Baltimore, MD and National Center for Biotechnology Information, National Library of Medicine, Bethesda,MD (2000), World Wide Web URL: http://www.ncbi.nlm.nih.gov/omim/

  9. Wain, H.M., Lush, M.J., Ducluzeau, F., Khodiyar, V.K., Povey, S.: Genew: the Human Gene Nomenclature Database. Nucleic Acids Res. 32(Database issue:), D255–D257 (2004)

    Google Scholar 

  10. Wilbur, J.W.: Boosting Naive Bayesian Learning on a Large Subset of MEDLINE. In: Proceedings of the AMIA Symposium, pp. 918–922 (2000)

    Google Scholar 

  11. Yu, H.: Synonym and homonym resolution of gene and protein names, PhD thesis, Columbia University, U.S.A. (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Seewald, A.K. (2004). Ranking for Medical Annotation: Investigating Performance, Local Search and Homonymy Recognition. In: López, J.A., Benfenati, E., Dubitzky, W. (eds) Knowledge Exploration in Life Science Informatics. KELSI 2004. Lecture Notes in Computer Science(), vol 3303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30478-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30478-4_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23927-7

  • Online ISBN: 978-3-540-30478-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics