Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview

Rebholz-Schuhmann, Dietrich; Clematide, Simon; Rinaldi, Fabio; Kafkas, Senay; van Mulligen, Erik M.; Bui, Chinh; Hellrich, Johannes; Lewin, Ian; Milward, David; Poprat, Michael; Jimeno-Yepes, Antonio; Hahn, Udo; Kors, Jan A.

doi:10.1007/978-3-642-40802-1_32

Dietrich Rebholz-Schuhmann^21,22,
Simon Clematide²¹,
Fabio Rinaldi²¹,
Senay Kafkas²²,
Erik M. van Mulligen²³,
Chinh Bui²³,
Johannes Hellrich²⁴,
Ian Lewin²⁵,
David Milward²⁵,
Michael Poprat²⁶,
Antonio Jimeno-Yepes²⁷,
Udo Hahn²⁴ &
…
Jan A. Kors²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8138))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1927 Accesses

Abstract

The identification and normalisation of biomedical entities from the scientific literature has a long tradition and a number of challenges have contributed to the development of reliable solutions. Increasingly patient records are processed to align their content with other biomedical data resources, but this approach requires analysing documents in different languages across Europe [1,2].

The CLEF-ER challenge has been organized by the Mantra project partners to improve entity recognition (ER) in multilingual documents. Several corpora in different languages, i.e. Medline titles, EMEA documents and patent claims, have been prepared to enable ER in parallel documents. The participants have been ask to annotate entity mentions with concept unique identifiers (CUIs) in the documents of their preferred non-English language.

The evaluation determines the number of correctly identified entity mentions against a silver standard (Task A) and the performance measures for the identification of CUIs in the non-English corpora. The participants could make use of the prepared terminological resources for entity normalisation and of the English silver standard corpora (SSCs) as input for concept candidates in the non-English documents.

The participants used different approaches including translation techniques and word or phrase alignments apart from lexical lookup and other text mining techniques. The performances for task A and B was lower for the patent corpus in comparison to Medline titles and EMEA documents. In the patent documents, chemical entities were identified at higher performance, whereas the other two document types cover a higher portion of medical terms. The number of novel terms provided from all corpora is currently under investigation.

Altogether, the CLEF-ER challenge demonstrates the performances of annotation solutions in different languages against an SSC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Article Open access 17 December 2021

Overview of CCKS 2018 Task 1: Named Entity Recognition in Chinese Electronic Medical Records

Improved Biomedical Entity Recognition via Longer Context Modeling

References

Roberts, A., Gaizauskas, R., Hepple, M., Davis, N., Demetriou, G., Guo, Y., Kola, J.S., Roberts, I., Setzer, A., Tapuria, A., et al.: The CLEF corpus: semantic annotation of clinical text. In: AMIA Annual Symposium Proceedings, vol. 2007, p. 625. American Medical Informatics Association (2007)
Google Scholar
Lussier, Y.A., Shagina, L., Friedman, C.: Automating icd-9-cm encoding using medical language processing: A feasibility study. In: Proceedings of the AMIA Symposium, p. 1072. American Medical Informatics Association (2000)
Google Scholar
Catarci, T., Ferro, N., Forner, P., Hiemstra, D., Karlgren, J., Penas, A., Santucci, G., Womser-Hacker, C.: CLEF 2012: information access evaluation meets multilinguality, multimodality, and visual analytics. ACM SIGIR Forum 46, 29–33 (2012)
Google Scholar
Roda, G., Tait, J., Piroi, F., Zenz, V.: CLEF-IP 2009: retrieval experiments in the Intellectual Property domain. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 385–409. Springer, Heidelberg (2010)
Chapter Google Scholar
Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 9(suppl. 2), S4 (2008), http://genomebiology.com/2008/9/S2/S4
Article Google Scholar
Morgan, A., Lu, Z., Wang, X., Cohen, A., Fluck, J., Ruch, P., Divoli, A., Fundel, K., Leaman, R., Hakenberg, J., Sun, C., Liu, H.H., Torres, R., Krauthammer, M., Lau, W., Liu, H., Hsu, C.N., Schuemie, M., Cohen, K.B., Hirschman, L.: Overview of BioCreative II gene normalization. Genome Biology 9(suppl. 2), S3 (2008), http://genomebiology.com/2008/9/S2/S3
Article Google Scholar
Cohen, K.B., Demner-Fushman, D., Ananiadou, S., Pestian, J., Tsujii, J., Webber, B. (eds.): Proceedings of the BioNLP 2009 Workshop. Association for Computational Linguistics, Boulder (2009), http://www.aclweb.org/anthology/W09-13
Google Scholar
Rebholz-Schuhmann, D., Yepes, A.J., Mulligen, E.M.V., Kang, N., Kors, J., Milward, D., Corbett, P., Buyko, E., Beisswanger, E., Hahn, U.: CALBC silver standard corpus. Journal of Bioinformatics and Computational Biology 8, 163–179 (2010)
Article Google Scholar
Rebholz-Schuhmann, D., Jimeno-Yepes, A., Li, C., Kafkas, S., Lewin, I., Kang, N., Corbett, P., Milward, D., Buyko, E., Beisswanger, E., Hornbostel, K., Kouznetsov, A., Witte, R., Laurila, J., Baker, C., Kuo, C.J., Clematide, S., Rinaldi, F., Farkas, R., Móra, G., Hara, K., Furlong, L., Rautschka, M., Lara Neves, M., Pascual-Montano, A., Wei, Q., Collier, N., Mahbub Chowdhury, M.F., Lavelli, A., Berlanga, R., Morante, R., Van Asch, V., Daelemans, W., Marina, J., van Mulligen, E., Kors, J., Hahn, U.: Assessment of NER solutions against the first and second CALBC Silver Standard Corpus. J. Biomedical Semantics 2(suppl. 5), S11 (2011)
Google Scholar
Hersh, W., Voorhees, E.: TREC genomics special issue overview. Inf. Retr. Boston 12, 1–15 (2009)
Article Google Scholar
Lu, Z.: PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford), 2011:baq036 (2011)
Google Scholar
Rebholz-Schuhmann, D., Clematide, S., Rinaldi, F., Kafkas, S., van Mulligen, E.M., Bui, C., Hellrich, J., Lewin, I., Milward, D., Poprat, M., Jimeno-Yepes, A., Hahn, U., Kors, J.A.: Multilingual semantic resources and parallel corpora in the biomedical domain: the CLEF-ER challenge. In: Proceedings CLEF Conference, vol. 2013 (2013)
Google Scholar
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004)
Article Google Scholar
Brown, E.G., Wood, L., Wood, S.: The medical dictionary for regulatory activities (MedDRA). Drug Safety 20(2), 109–117 (1999)
Article Google Scholar
Stearns, M.Q., Price, C., Spackman, K.A., Wang, A.Y.: SNOMED clinical terms: overview of the development process and project status. In: Proceedings of the AMIA Symposium, vol. 662, American Medical Informatics Association (2001)
Google Scholar
Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L.J., Eilbeck, K., Ireland, A., Mungall, C.J., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S.A., Scheuermann, R.H., Shah, N., Whetzel, P.L., Lewis, S.: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25, 1251–1255 (2007)
Article Google Scholar
Lewin, I., Kafkas, S., Rebholz-Schuhmann, D.: Centroids: Gold standards with distributional variations. In: Proceedings of the Language Resources Evaluation Conference, Istanbul, Turkey (2012)
Google Scholar
Lewin, I., Clematide, S.: Deriving the Mantra Silver Standard. In: Proceedings CLEF Conference, vol. 2013 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Linguistics, University of Zürich, Switzerland
Dietrich Rebholz-Schuhmann, Simon Clematide & Fabio Rinaldi
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, U.K.
Dietrich Rebholz-Schuhmann & Senay Kafkas
Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
Erik M. van Mulligen, Chinh Bui & Jan A. Kors
Jena University Language & Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Fürstengraben 30, D-07743, Jena, Germany
Johannes Hellrich & Udo Hahn
Linguamatics Ltd, 324 Science Park, Milton Road, Cambridge, CB4 0WG, UK
Ian Lewin & David Milward
Averbis GmbH, Tennenbacher Strasse 11, D-79106, Freiburg, Germany
Michael Poprat
Victoria Research Laboratory, National ICT Australia, Melbourne, Australia
Antonio Jimeno-Yepes

Authors

Dietrich Rebholz-Schuhmann
View author publications
You can also search for this author in PubMed Google Scholar
Simon Clematide
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Rinaldi
View author publications
You can also search for this author in PubMed Google Scholar
Senay Kafkas
View author publications
You can also search for this author in PubMed Google Scholar
Erik M. van Mulligen
View author publications
You can also search for this author in PubMed Google Scholar
Chinh Bui
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Hellrich
View author publications
You can also search for this author in PubMed Google Scholar
Ian Lewin
View author publications
You can also search for this author in PubMed Google Scholar
David Milward
View author publications
You can also search for this author in PubMed Google Scholar
Michael Poprat
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Jimeno-Yepes
View author publications
You can also search for this author in PubMed Google Scholar
Udo Hahn
View author publications
You can also search for this author in PubMed Google Scholar
Jan A. Kors
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for the Evaluation of Language and Communication Technologies (CELCT), via alla Cascata 56/c, 38123, Povo, Italy
Pamela Forner
HES-SO Valais, University of Applied Sciences Western Switzerland, Technopôle 3, 3960, Sierre, Switzerland
Henning Müller
Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València, Camino de Vera s/n, 46071, València, Spain
Roberto Paredes
Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València, Camino de Vera s/n, 46022, València, Spain
Paolo Rosso
Bauhaus-Universität Weimar, Bauhausstraße 11, 99423, Weimar, Germany
Benno Stein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rebholz-Schuhmann, D. et al. (2013). Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds) Information Access Evaluation. Multilinguality, Multimodality, and Visualization. CLEF 2013. Lecture Notes in Computer Science, vol 8138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40802-1_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-40802-1_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40801-4
Online ISBN: 978-3-642-40802-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics