Abstract
In this paper we investigate several hypotheses concerning document relevance ranking for biological literature. More specifically, we focus on three topics: performance, risk of local searching, and homonymy recognition. Surprisingly, we find that a quite simple ranker based on the occurrence of a single word performs best. Adding this word as a new search term to each query yields results comparable to elaborate state-of-the-art approaches. The risk of our local searching approach is found to be negligible. In some cases retrieval from a large repository even yields worse results than local search on a smaller repository which only contains documents returned by the current query. The removal of automatically determined homonyms yields almost indistinguishable results to the original query, so it is not inconceivable that the problem of homonymy in biological literature has been overstated. Concluding, our investigation of three hypotheses has been useful to decide implementation issues within our research projects as well as opening interesting venues for further research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Attwood, T.K., Bradley, P., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G.: The PRINTS protein fingerprint database: functional and evolutionary applications. In: Dunn, M., Jorde, L., Little, P., Subramaniam, A. (eds.) Encyclopaedia of Genomics, Proteomics and Bioinformatics (2004), www.bioinf.man.ac.uk/dbbrowser/PRINTS
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement Trembl in 2003. Nucleic Acids Research 31(1), 365–370 (2003), http://www.expasy.org/sprot
Dobrokhotov, P.B., Goutte, C., Veuthey, A.-L., Gaussier, E.: Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation. Bioinformatics 19, 191–194 (2003)
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H.: Accomplishments and Challenges in Literature Data Mining for Biology. Bioinformatics Journal 18, 1553–1561 (2002)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)
Marcotte, E.M., et al.: Mining literature for protein-protein interactions. Bioinformatics 17, 359–363 (2001)
Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University, Baltimore, MD and National Center for Biotechnology Information, National Library of Medicine, Bethesda,MD (2000), World Wide Web URL: http://www.ncbi.nlm.nih.gov/omim/
Wain, H.M., Lush, M.J., Ducluzeau, F., Khodiyar, V.K., Povey, S.: Genew: the Human Gene Nomenclature Database. Nucleic Acids Res. 32(Database issue:), D255–D257 (2004)
Wilbur, J.W.: Boosting Naive Bayesian Learning on a Large Subset of MEDLINE. In: Proceedings of the AMIA Symposium, pp. 918–922 (2000)
Yu, H.: Synonym and homonym resolution of gene and protein names, PhD thesis, Columbia University, U.S.A. (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Seewald, A.K. (2004). Ranking for Medical Annotation: Investigating Performance, Local Search and Homonymy Recognition. In: López, J.A., Benfenati, E., Dubitzky, W. (eds) Knowledge Exploration in Life Science Informatics. KELSI 2004. Lecture Notes in Computer Science(), vol 3303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30478-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-30478-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23927-7
Online ISBN: 978-3-540-30478-4
eBook Packages: Springer Book Archive