Skip to main content

DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

Abstract

To tackle the data scarcity problem of document-level event extraction, we come up with a large-scale benchmark, DuEE-Fin, which consists of 15,000+ events categorized into 13 event types, and 81,000+ event arguments mapped in 92 argument roles. We constructed DuEE-Fin from real-world Chinese financial news, which allows one document to contain several events, multiple arguments to share the same argument role and one argument to play different roles in different events. Therefore, it presents some considerable challenges in document-level event extraction task such as multi-event recognition and multi-value argument identification, that are referred to as key issues for document-level event extraction task. Along with DuEE-Fin, we also hosted an open competition, which has attracted 1,690 teams and achieved exciting results. We performed experiments on DuEE-Fin with most popular document-level event extraction systems. However, results showed that even some SOTA models performed poorly with our data. Facing these challenges, we found it necessary to propose more effective methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://aistudio.baidu.com/aistudio/competition/detail/46/0/task-definition.

References

  1. Chen, M., et al.: Event-centric natural language processing. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Tutorial Abstracts. pp. 6–14. Association for Computational Linguistics, Online (Aug 2021). https://doi.org/10.18653/v1/2021.acl-tutorials.2, https://aclanthology.org/2021.acl-tutorials.2

  2. Ebner, S., Xia, P., Culkin, R., Rawlins, K., Van Durme, B.: Multi-sentence argument linking. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 8057–8077. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.718, https://www.aclweb.org/anthology/2020.acl-main.718

  3. Fung, Y., et al.: InfoSurgeon: cross-media fine-grained information consistency checking for fake news detection. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 1683–1698. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.133, https://aclanthology.org/2021.acl-long.133

  4. Grishman, R., Sundheim, B.: Message understanding conference- 6: a brief history. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics (1996). https://aclanthology.org/C96-1079

  5. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907 (2016). http://arxiv.org/abs/1609.02907

  6. Li, M., et al.: GAIA: A fine-grained multimedia knowledge extraction system. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 77–86. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-demos.11, https://www.aclweb.org/anthology/2020.acl-demos.11

  7. Li, S., Ji, H., Han, J.: Document-level event argument extraction by conditional generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 894–908. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.69, https://www.aclweb.org/anthology/2021.naacl-main.69

  8. Li, X., Li, F., Pan, L., Chen, Y., Peng, W., Wang, Q., Lyu, Y., Zhu, Y.: Duee: a large-scale dataset for Chinese event extraction in real-world scenarios. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds.) Natural Language Processing and Chinese Computing, pp. 534–545. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-319-73618-110.1007/978-3-319-73618-1

    Chapter  Google Scholar 

  9. Li, Z., Ding, X., Liu, T.: Constructing narrative event evolutionary graph for script event prediction. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. pp. 4201–4207. International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden (Jul 2018). https://doi.org/10.24963/ijcai.2018/584, https://www.ijcai.org/proceedings/2018/584

  10. Lin, Y., Ji, H., Huang, F., Wu, L.: A joint neural model for information extraction with global features. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 7999–8009. Association for Computational Linguistics, Online (Jul 2020). https://doi.org/10.18653/v1/2020.acl-main.713, https://aclanthology.org/2020.acl-main.713

  11. Liu, J., Chen, Y., Liu, K., Bi, W., Liu, X.: Event extraction as machine reading comprehension. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1641–1651. Association for Computational Linguistics, Online (Nov 2020). https://doi.org/10.18653/v1/2020.emnlp-main.128, https://aclanthology.org/2020.emnlp-main.128

  12. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. vol. 30, pp. 5998–6008 (2017)

    Google Scholar 

  13. Wadden, D., Wennberg, U., Luan, Y., Hajishirzi, H.: Entity, relation, and event extraction with contextualized span representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 5784–5789. Association for Computational Linguistics, Hong Kong, China (Nov 2019). https://doi.org/10.18653/v1/D19-1585, https://aclanthology.org/D19-1585

  14. Xu, R., Liu, T., Li, L., Chang, B.: Document-level event extraction via heterogeneous graph-based interaction model with a tracker. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 3533–3546. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.274, https://aclanthology.org/2021.acl-long.274

  15. Yang, H., Chen, Y., Liu, K., Xiao, Y., Zhao, J.: DCFEE: a document-level chinese financial event extraction system based on automatically labeled training data. In: Proceedings of ACL 2018, System Demonstrations. pp. 50–55. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-4009, http://aclweb.org/anthology/P18-4009

  16. Zheng, S., Cao, W., Xu, W., Bian, J.: Doc2EDAG: an end-to-end document-level framework for chinese financial event extraction. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 337–346. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1032, https://www.aclweb.org/anthology/D19-1032

  17. Zhu, T., et al.: Efficient document-level event extraction via pseudo-trigger-aware pruned complete graph (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cuiyun Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Han, C., Zhang, J., Li, X., Xu, G., Peng, W., Zeng, Z. (2022). DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17120-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17119-2

  • Online ISBN: 978-3-031-17120-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics