Skip to main content

CONANN: An Online Biomedical Concept Annotator

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4544))

Abstract

We describe our biomedical concept annotator designed for online environments, CONANN, which takes a biomedical source phrase and finds the best-matching biomedical concept from a domain resource. Domain concepts are defined in resources such as the United States National Library of Medicine’s Unified Medical Language System Metathesaurus. CONANN uses an incremental filtering approach to narrow down a list of candidate phrases before deciding on a best match. We show that this approach has the advantage of improving annotation speed over an existing state-of-the-art concept annotator, facilitating the use of concept annotation in online environments. Our main contributions are 1) the design of a phrase-unit concept annotator more readily usable in online environments than existing systems, 2) the introduction of a model which uses semantically focused words in a given ontology (e.g., UMLS) to measure coverage, called Inverse Phrase Frequency, and 3) the use of two different filters to measure coverage and coherence between a source phrase and a domain-specific candidate phrase. An intrinsic evaluation comparing CONANN’s concept output to a state-of-the-art concept annotator shows our system has an annotation precision ranging from 90% for exact match concept to 95% for relaxed concept matching while average phrase annotation time is eighteen times faster. In addition, an extrinsic evaluation using the generated concepts in a text summarization task shows no significant degradation when using CONANN.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. United States National Library of Medicine,PubMed (2006), http://www.ncbi.nlm.nih.gov/entrez/query.fcgi

  2. Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: The MetaMap program. In: Proceedings of the AMIA Symposium, 2001, pp. 17–21 (2001)

    Google Scholar 

  3. Hersh, W.R., Greenes, R.A.: SAPHIRE–an information retrieval system featuring concept matching, automatic indexing, probabilistic retrieval, and hierarchical relationships. Comput. Biomed. Res. 23, 410–425 (1990)

    Article  Google Scholar 

  4. Denny, J.C., Irani, P.R., Wehbe, F.H., Smithers, J.D., Spickard 3rd, A.: The KnowledgeMap project: development of a concept-based medical school curriculum database. In: Proceedings of the Annual AMIA Symposium, pp. 195–199 (2003)

    Google Scholar 

  5. Reeve, L., Han, H., Nagori, S.V., Yang, J., Schwimmer, T., Brooks, A.D.: Concept frequency distribution in biomedical text summarization. In: Proceedings of the ACM Fifteenth Conference on Information and Knowledge Management (CIKM’06), pp. 604–611 (2006)

    Google Scholar 

  6. Zou, Q., Chu, W.W., Morioka, C., Leazer, G.H., Kangarloo, H.: IndexFinder: A method of extracting key concepts from clinical texts for indexing. In: Proceedings of the AMIA Annual Symposium, pp. 763–767 (2003)

    Google Scholar 

  7. Handschuh, S., Staab, S., Volz, R.: On deep annotation. In: International WWW Conference, pp. 431–438 (2003)

    Google Scholar 

  8. Reeve, L., Han, H.: A comparison of semantic annotation systems for text-based web documents. In Web Semantics and Ontology, 1st ed. vol. 1, Taniar, D., Rahayu, J. W., (eds.) Hershey, PA USA: Idea Group (2006)

    Google Scholar 

  9. United States National Library of Medicine. UMLS metathesaurus fact sheet (2006), http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html

  10. Hersh, W., Leone, T.J.: The SAPHIRE server: a new algorithm and implementation. In: Proc. Annu. Symp. Comput. Appl. Med. Care. pp. 858-862 (1995)

    Google Scholar 

  11. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing, 1st edn. p. 620. The MIT Press, Cambridge, Massachusetts (1999)

    MATH  Google Scholar 

  12. Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)

    Google Scholar 

  13. Lin, C.Y., Och, F.J.: Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp. 605–612 (2004)

    Google Scholar 

  14. Lin, C.Y., Och, F.J.: Orange: A method for evaluating automatic evaluation metrics for machine translation. In: Proceedings of the 20Th International Conference on Computational Linguistics, pp. 501–507 (2004)

    Google Scholar 

  15. Lavie, A., Sagae, K., Jayaraman, S.: The significance of recall in automatic metrics for MT evaluation. In: Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (2004)

    Google Scholar 

  16. Zieman, Y.L., Bleich, H.L.: Conceptual mapping of user’s queries to medical subject headings. In: Proc. AMIA. Annu. Fall. Symp. pp. 519–522 (1997)

    Google Scholar 

  17. Nadkarni, P.M.: Concept locator: a client-server application for retrieval of UMLS metathesaurus concepts through complex boolean query. Comput. Biomed. Res. 30, 323–336 (1997)

    Article  Google Scholar 

  18. Srinivasan, S., Rindflesch, T.C., Hole, W.T., Aronson, A.R., Mork, J.G.: Finding UMLS Metathesaurus concepts in MEDLINE. In: Proc. AMIA. Symp. pp. 727–731 (2002)

    Google Scholar 

  19. Brooks, A.D., Sulimanoff, I.: Evidence-based oncology project. Surgical Oncology Clinics of North America, Anonymous 11, 3–10 (2002)

    Article  Google Scholar 

  20. Lin, C.: Looking for a few good metrics: Automatic summarization evaluation - how many samples are enough? In: Proceedings of the NTCIR Workshop 4 (2004)

    Google Scholar 

  21. Devita, G.: MMTx API documentation for release 2.3 (2006)

    Google Scholar 

  22. Hersh, W.R., Mailhot, M., Arnott-Smith, C., Lowe, H.J.: Selective Automated Indexing of Findings and Diagnoses in Radiology Reports. J. Biomed. Inform. 34, 262–273 (2001)

    Article  Google Scholar 

  23. Nenkova, A., Vanderwende, L.: The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005-101 (2005)

    Google Scholar 

  24. Lin, C., Hovy, E.H.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of, Language Technology Conference (HLT-NAACL 2003), 2003, pp. 71–78 (2003)

    Google Scholar 

  25. National Institute of Standards and Technology (NIST), Document Understanding Conferences (2006), http://duc.nist.gov

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sarah Cohen-Boulakia Val Tannen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Reeve, L.H., Han, H. (2007). CONANN: An Online Biomedical Concept Annotator. In: Cohen-Boulakia, S., Tannen, V. (eds) Data Integration in the Life Sciences. DILS 2007. Lecture Notes in Computer Science(), vol 4544. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73255-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73255-6_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73254-9

  • Online ISBN: 978-3-540-73255-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics