Ranking for Medical Annotation: Investigating Performance, Local Search and Homonymy Recognition

Seewald, Alexander K.

doi:10.1007/978-3-540-30478-4_10

Alexander K. Seewald²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3303))

Included in the following conference series:

International Symposium on Knowledge Exploration in Life Science Informatics

305 Accesses
1 Citations
3 Altmetric

Abstract

In this paper we investigate several hypotheses concerning document relevance ranking for biological literature. More specifically, we focus on three topics: performance, risk of local searching, and homonymy recognition. Surprisingly, we find that a quite simple ranker based on the occurrence of a single word performs best. Adding this word as a new search term to each query yields results comparable to elaborate state-of-the-art approaches. The risk of our local searching approach is found to be negligible. In some cases retrieval from a large repository even yields worse results than local search on a smaller repository which only contains documents returned by the current query. The removal of automatically determined homonyms yields almost indistinguishable results to the original query, so it is not inconceivable that the problem of homonymy in biological literature has been overstated. Concluding, our investigation of three hypotheses has been useful to decide implementation issues within our research projects as well as opening interesting venues for further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Attwood, T.K., Bradley, P., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G.: The PRINTS protein fingerprint database: functional and evolutionary applications. In: Dunn, M., Jorde, L., Little, P., Subramaniam, A. (eds.) Encyclopaedia of Genomics, Proteomics and Bioinformatics (2004), www.bioinf.man.ac.uk/dbbrowser/PRINTS
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement Trembl in 2003. Nucleic Acids Research 31(1), 365–370 (2003), http://www.expasy.org/sprot
Article Google Scholar
Dobrokhotov, P.B., Goutte, C., Veuthey, A.-L., Gaussier, E.: Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation. Bioinformatics 19, 191–194 (2003)
Article Google Scholar
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H.: Accomplishments and Challenges in Literature Data Mining for Biology. Bioinformatics Journal 18, 1553–1561 (2002)
Article Google Scholar
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)
Article MATH Google Scholar
Marcotte, E.M., et al.: Mining literature for protein-protein interactions. Bioinformatics 17, 359–363 (2001)
Article Google Scholar
Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University, Baltimore, MD and National Center for Biotechnology Information, National Library of Medicine, Bethesda,MD (2000), World Wide Web URL: http://www.ncbi.nlm.nih.gov/omim/
Wain, H.M., Lush, M.J., Ducluzeau, F., Khodiyar, V.K., Povey, S.: Genew: the Human Gene Nomenclature Database. Nucleic Acids Res. 32(Database issue:), D255–D257 (2004)
Google Scholar
Wilbur, J.W.: Boosting Naive Bayesian Learning on a Large Subset of MEDLINE. In: Proceedings of the AMIA Symposium, pp. 918–922 (2000)
Google Scholar
Yu, H.: Synonym and homonym resolution of gene and protein names, PhD thesis, Columbia University, U.S.A. (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Austrian Research Institute for Artificial Intelligence, Freyung 6/6, A-1010, Vienna, Austria
Alexander K. Seewald

Authors

Alexander K. Seewald
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Sciences, School of Mathematics and Computing, University of Southern Queensland, 4350, Toowoomba, QLD, Australia
Jesús A. López
Istituto di Ricerche Farmacologiche “Mario Negri”, Via Eritrea 62, 20157, Milano, Italy
Emilio Benfenati
School of Biomedial Sciences, Bioinformatics Research Group, University of Ulster, Cromore Road, BT52 1SA, Coleraine, Northern Ireland, UK
Werner Dubitzky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Seewald, A.K. (2004). Ranking for Medical Annotation: Investigating Performance, Local Search and Homonymy Recognition. In: López, J.A., Benfenati, E., Dubitzky, W. (eds) Knowledge Exploration in Life Science Informatics. KELSI 2004. Lecture Notes in Computer Science(), vol 3303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30478-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-30478-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23927-7
Online ISBN: 978-3-540-30478-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics