Skip to main content

Overview of ChEMU 2020: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12260))

Abstract

In this paper, we provide an overview of the Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020). The ChEMU evaluation lab focuses on information extraction over chemical reactions from patent texts. Using the ChEMU corpus of 1500 “snippets” (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 addresses chemical named entity recognition, the identification of chemical compounds and their specific roles in chemical reactions. Task 2 focuses on event extraction, the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. Herein, we describe the resources created for these tasks and the evaluation methodology adopted. We also provide a brief summary of the participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than our baseline methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.reaxys.com Reaxys® Copyright ©2020 Elsevier Limited except certain content provided by third parties. Reaxys is a trademark of Elsevier Limited.

  2. 2.

    http://mallet.cs.umass.edu/.

  3. 3.

    http://chemu.eng.unimelb.edu.au/.

  4. 4.

    The run that we received from team Lasige_BioTM is not included in the table due to a technical issue found in this run.

  5. 5.

    The run that we received from the Lasige_BioTM team is not included in the table as there was a technical issue in this run. Two runs from Melaxtech, Melaxtech-run2 and Melaxtech-run3, had very low performance, due to an error in their data pre-processing step.

References

  1. BRATEval evaluation tool. https://bitbucket.org/nicta_biomed/brateval/src/master/. Accessed 23 June 2020

  2. International Patent Classification. https://www.wipo.int/classifications/ipc/en/. Accessed 23 June 2020

  3. Akhondi, S.A., et al.: Annotated chemical patent corpus: a gold standard for text mining. PLoS ONE 9(9), e107477 (2014)

    Article  Google Scholar 

  4. Akhondi, S.A., et al.: Automatic identification of relevant chemical compounds from patents. Database 2019 (2019)

    Google Scholar 

  5. Bregonje, M.: Patents: a unique source for scientific technical information in chemistry related industry? World Patent Inf. 27(4), 309–315 (2005)

    Article  Google Scholar 

  6. Carletta, J.: Assessing agreement on classification tasks: the Kappa statistic. Comput. Linguist. 22(2), 249–254 (1996). https://www.aclweb.org/anthology/J96-2004

  7. Jurafsky, D., Martin, J.H.: Semantic role labeling and argument structure. In: Speech & Language Processing, 3rd edn. Pearson Education India (2009)

    Google Scholar 

  8. Kim, J.D., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Overview of BioNLP 2009 shared task on event extraction. In: Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task, pp. 1–9 (2009)

    Google Scholar 

  9. Lawson, A.J., Roller, S., Grotz, H., Wisniewski, J.L., Goebels, L.: Method and software for extracting chemical data. German patent no. DE102005020083A1 (2011)

    Google Scholar 

  10. Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing 2008, pp. 652–663. World Scientific (2008)

    Google Scholar 

  11. Muresan, S., et al.: Making every SAR point count: the development of Chemistry Connect for the large-scale integration of structure and bioactivity data. Drug Discov. Today 16(23–24), 1019–1030 (2011)

    Article  Google Scholar 

  12. Nguyen, D.Q., et al.: ChEMU: named entity recognition and event extraction of chemical reactions from patents. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 572–579. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_74

    Chapter  Google Scholar 

  13. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)

    Article  Google Scholar 

  14. Sætre, R., Yoshida, K., Yakushiji, A., Miyao, Y., Matsubayashi, Y., Ohta, T.: AKANE system: protein-protein interaction pairs in BioCreAtIvE2 challenge, PPI-IPS subtask. In: Proceedings of the second BioCreative challenge workshop, Madrid, vol. 209, p. 212 (2007)

    Google Scholar 

  15. Senger, S., Bartek, L., Papadatos, G., Gaulton, A.: Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents. J. Cheminform. 7(1), 1–12 (2015). https://doi.org/10.1186/s13321-015-0097-z

    Article  Google Scholar 

  16. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107 (2012)

    Google Scholar 

  17. Verspoor, K., et al.: ChEMU dataset for information extraction from chemical patents. https://doi.org/10.17632/wy6745bjfj.1

  18. Yoshikawa, H., et al.: Detecting chemical reactions in patents. In: Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association, pp. 100–110 (2019)

    Google Scholar 

Download references

Acknowledgements

We are grateful for the detailed excerption and annotation work of the domain experts that support Reaxys, and the support of Ivan Krstic, Director of Chemistry Solutions at Elsevier. Funding for the ChEMU project is provided by an Australian Research Council Linkage Project, project number LP160101469, and Elsevier.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karin Verspoor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, J. et al. (2020). Overview of ChEMU 2020: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2020. Lecture Notes in Computer Science(), vol 12260. Springer, Cham. https://doi.org/10.1007/978-3-030-58219-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58219-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58218-0

  • Online ISBN: 978-3-030-58219-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics